ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Semantic Enterprise Architecture

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Fri, 03 Sep 2010 18:12:39 -0400
Message-id: <4C8172D7.7030604@xxxxxxxxxxxxxx>
  On 9/3/10 5:27 PM, John F. Sowa wrote:
> On 9/2/2010 2:59 PM, sean barker wrote:
>> With regards Wikipedia, one might also point to DBpedia
>> http://dbpedia.org/About), in which the infoboxes from Wikipedia are
>> translated to RDF, and then can be queried via a SPARQL end point.
> Yes, but DBpedia uses RDF and SPARQL.  I cited Wikipedia Miner and JWPL
> (Java Wikipedia Library) because they process the original source texts
> and they do *not* use any of the tools developed for the Semantic Web.
>
> The point I was trying to make is that the Semantic Web has been around
> for over a decade, and the rate of adoption of their tools has been
> slow.  Recently, people have pointed to LOD as an application of SemWeb
> tools.  But the jury is still out.
John,    (01)

Linked Data is an application of Data Access by Reference pattern + 
Structured Entity Descriptions applied to an aspect of the Semantic Web 
project (basically Data Web or Web of Data foundation) . Fundamentally, 
it provides a "Webby" dimension to the time-tested EAV model via 
de-reference HTTP URI based Names. Thus, its primarily about the "Web" 
aspect of the "Semantic Web" misnomer.    (02)

Naturally, your comments are more oriented to the "Semantic" part of the 
"Semantic Web" misnomer. And in this particular case, NLP tools (of the 
kind you describe) come into play re. Wikipedia content.    (03)

> Note that Wikipedia Miner and JWPL use automated methods for extracting
> information from the Wikipedia and build a relational DB for high-speed
> processing of that information.  These tools are also in their infancy,
> but the developers found MySQL to be more useful than RDF and SPARQL,
> and they made their tools freely available to anybody else.
No.    (04)

MySQL doesn't cut it at all. Neither does Oracle or any other 
traditional RDBMS. You need a hybrid DBMS e.g OpenLink Virtuoso. 
Remember, Virtuoso is the DBMS behind DBpedia (little Linked Data Space) 
and the massive Linked Open Data Cloud Cache (17 Billion Triples Linked 
Data Space).    (05)

We will publish a paper next week (link will be posted to this list) 
about how we've continued down the path of mixed model DBMS technology 
that delivers best of both worlds.    (06)

> At our VivoMind company, we accept RDF and OWL as input, but we map
> them to CGIF and Prolog for more concise and efficient processing.
>
> Many other companies of varying sizes do something similar.  The
> largest is Google.  They also accept RDF as input, but the Google
> tool set uses JSON, not RDF for storing triples, arbitrary n-tuples,
> and tagged URIs, which may be stored in n-tuples or arrays.
>
> Even the W3C is getting the message that RDF may be bit bloated
> and inefficient.  They recommend other notations, such as N3,
> for human consumption, they have already approved RDFa as a
> vastly simpler notation than RDF, and they are considering
> other optional representations.
>
> I'm in favor of using URIs to link data, but I wouldn't bet on
> the current XML-based notation for RDF as a long-term solution.    (07)

RDF and RDF/XML aren't inextricably linked. We are taking URIs in and 
managing the data in a mixed model DBMS. As the paper will show, we also 
put column store technology to innovative use etc..    (08)


Kingsley
> Following is a note that I sent to another email list.
>
> John
>
> -------- Original Message --------
> Subject: How to specify a standard
> Date: Thu, 02 Sep 2010 22:17:03 -0400
> From: John F. Sowa<sowa@xxxxxxxxxxx>
> To: architecture-ecosystem@xxxxxxx
>
> On 9/2/2010 11:41 AM, Ed Barkmeyer wrote:
>   >  If one standard serialization format is good, surely 7 such standards
>   >  is much better.  In point of fact, one standard serialization format
>   >  that is no one's favorite will guarantee successful interchange,
>   >  while two standards will make interchange more difficult, and more
>   >  than two makes it impossible.
>
> The first sentence may or may not be true, but the second sentence
> is *false* in general.  There are indeed many cases for which it is
> true, but the cases that are essential for OMG and the Semantic Web
> are ones in which that statement is false.
>
> The cases for which it is true include software systems that define
> the syntax precisely, but leave the semantics vague.  As an example,
> consider the C language, which was developed at AT&T to implement
> Unix on a PDP-11.
>
> The syntax of C was well defined, but the semantics was very loose.
> However, the Unix system served as a very large suite of test cases
> that imposed strong constraints on the C semantics.  Any compiler
> for C that could generate code for running Unix on another platform
> had to be strongly compatible with the original C compiler.  Even
> then, there were loose ends that required further debugging on
> each platform to which Unix was ported.
>
> That kind of implicit specification is not acceptable for logic.
> Common Logic is an extremely small language, whose semantics is
> completely specified in 6 pages.  It is possible to define formal
> mappings from the abstract CL syntax to and from an open-ended
> number of different syntaxes in such a way that logical equivalence
> is not only provable, but *guaranteed* by the mapping.
>
> The major weakness of XML-based notations is that the angle brackets
> have many nooks and crannies into which people are tempted to stuff
> all kinds of extraneous information, which may or may not have
> semantic significance.  That is the problem with RDF that Tim Bray
> criticized years ago.  In general, a notation that looks ugly is
> in serious danger of having latent bugs and semantic loopholes.
>
> Therefore, the XML specification for RDF is totally unacceptable
> for a standard.  N3 is far simpler than the XML-based notation,
> largely because it lacks the XML nooks and crannies.  Furthermore,
> N3 is the notation that developers generally use, because the
> base-level RDF is far too complex for human use.
>
> But my recommendation is even simpler:  the official standard
> for RDF should be specified in Common Logic.  See the excerpt
> below, which is taken from my previous note.
>
> As the CL version shows, the amount of semantically significant
> information in RDF or N3 is trivial.  Everything else in RDF
> is semantically irrelevant commentary.
>
> Logicians define very precise logics because they keep the
> specification small.  CL is an example.  They also ensure that
> their mappings to different syntaxes are provably equivalent
> by keeping them so small and simple that every aspect of the
> mapping can be exhaustively tested.
>
> The UML diagrams and other modeling languages are intended to
> specify software systems.  We cannot tolerate any ambiguity or
> looseness of any kind in those languages.  Their semantics must
> be defined in terms of a logic whose foundation is as small and
> compact as CL.
>
> If you have such a tiny core semantics with very small mappings
> to other notations, you can guarantee that all those notations
> have identical semantics.  But if you start with a huge, loosely
> defined kludge such as the XML-based definitions of RDF, you
> cannot avoid ambiguities and loose ends, even if there is only
> one specification document and one syntax.
>
> John
> __________________________________________________________________
>
> I would point out that N3 has a simple mapping to CLIF.  Just take
> any set of triples of the form A R B and move each R to the front:
>
>      (and (R1 A1 B1) (R2 A2 B2) (R3 A3 B3) (R4 A4 B4))
>
> In CGIF notation, you can even omit the "(and" :
>
>      (R1 A1 B1) (R2 A2 B2) (R3 A3 B3) (R4 A4 B4)
>
> RDF also permits "blank nodes", which, as Pat observed, could be
> expressed by existential quantifiers.  For example, if A2, R3, and B4
> are intended to be blanks, one could write
>
>      (exists (A2 R3 B4)
>         (and (R1 A1 B1) (R2 A2 B2) (R3 A3 B3) (R4 A4 B4)) )
>
> Following is the equivalent in CGIF notation:
>
>      (R1 A1 B1) (R2 [*A2] B2) ([*R3] A3 B3) (R4 A4 [*B4])
>
> The subset of CL with just these features is a perfectly acceptable
> dialect, and it has a direct mapping to N3 and to the semantically
> significant core of RDF.
>
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
> To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>
>    (09)


--     (010)

Regards,    (011)

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen    (012)






_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (013)

<Prev in Thread] Current Thread [Next in Thread>