[Top] [All Lists]

Re: [ontolog-forum] Semantic Enterprise Architecture

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Fri, 03 Sep 2010 17:27:05 -0400
Message-id: <4C816829.3030902@xxxxxxxxxxx>
On 9/2/2010 2:59 PM, sean barker wrote:
> With regards Wikipedia, one might also point to DBpedia
> http://dbpedia.org/About), in which the infoboxes from Wikipedia are
> translated to RDF, and then can be queried via a SPARQL end point.    (01)

Yes, but DBpedia uses RDF and SPARQL.  I cited Wikipedia Miner and JWPL
(Java Wikipedia Library) because they process the original source texts
and they do *not* use any of the tools developed for the Semantic Web.    (02)

The point I was trying to make is that the Semantic Web has been around
for over a decade, and the rate of adoption of their tools has been
slow.  Recently, people have pointed to LOD as an application of SemWeb
tools.  But the jury is still out.    (03)

Note that Wikipedia Miner and JWPL use automated methods for extracting
information from the Wikipedia and build a relational DB for high-speed
processing of that information.  These tools are also in their infancy,
but the developers found MySQL to be more useful than RDF and SPARQL,
and they made their tools freely available to anybody else.    (04)

At our VivoMind company, we accept RDF and OWL as input, but we map
them to CGIF and Prolog for more concise and efficient processing.    (05)

Many other companies of varying sizes do something similar.  The
largest is Google.  They also accept RDF as input, but the Google
tool set uses JSON, not RDF for storing triples, arbitrary n-tuples,
and tagged URIs, which may be stored in n-tuples or arrays.    (06)

Even the W3C is getting the message that RDF may be bit bloated
and inefficient.  They recommend other notations, such as N3,
for human consumption, they have already approved RDFa as a
vastly simpler notation than RDF, and they are considering
other optional representations.    (07)

I'm in favor of using URIs to link data, but I wouldn't bet on
the current XML-based notation for RDF as a long-term solution.
Following is a note that I sent to another email list.    (08)

John    (09)

-------- Original Message --------
Subject: How to specify a standard
Date: Thu, 02 Sep 2010 22:17:03 -0400
From: John F. Sowa <sowa@xxxxxxxxxxx>
To: architecture-ecosystem@xxxxxxx    (010)

On 9/2/2010 11:41 AM, Ed Barkmeyer wrote:
 > If one standard serialization format is good, surely 7 such standards
 > is much better.  In point of fact, one standard serialization format
 > that is no one's favorite will guarantee successful interchange,
 > while two standards will make interchange more difficult, and more
 > than two makes it impossible.    (011)

The first sentence may or may not be true, but the second sentence
is *false* in general.  There are indeed many cases for which it is
true, but the cases that are essential for OMG and the Semantic Web
are ones in which that statement is false.    (012)

The cases for which it is true include software systems that define
the syntax precisely, but leave the semantics vague.  As an example,
consider the C language, which was developed at AT&T to implement
Unix on a PDP-11.    (013)

The syntax of C was well defined, but the semantics was very loose.
However, the Unix system served as a very large suite of test cases
that imposed strong constraints on the C semantics.  Any compiler
for C that could generate code for running Unix on another platform
had to be strongly compatible with the original C compiler.  Even
then, there were loose ends that required further debugging on
each platform to which Unix was ported.    (014)

That kind of implicit specification is not acceptable for logic.
Common Logic is an extremely small language, whose semantics is
completely specified in 6 pages.  It is possible to define formal
mappings from the abstract CL syntax to and from an open-ended
number of different syntaxes in such a way that logical equivalence
is not only provable, but *guaranteed* by the mapping.    (015)

The major weakness of XML-based notations is that the angle brackets
have many nooks and crannies into which people are tempted to stuff
all kinds of extraneous information, which may or may not have
semantic significance.  That is the problem with RDF that Tim Bray
criticized years ago.  In general, a notation that looks ugly is
in serious danger of having latent bugs and semantic loopholes.    (016)

Therefore, the XML specification for RDF is totally unacceptable
for a standard.  N3 is far simpler than the XML-based notation,
largely because it lacks the XML nooks and crannies.  Furthermore,
N3 is the notation that developers generally use, because the
base-level RDF is far too complex for human use.    (017)

But my recommendation is even simpler:  the official standard
for RDF should be specified in Common Logic.  See the excerpt
below, which is taken from my previous note.    (018)

As the CL version shows, the amount of semantically significant
information in RDF or N3 is trivial.  Everything else in RDF
is semantically irrelevant commentary.    (019)

Logicians define very precise logics because they keep the
specification small.  CL is an example.  They also ensure that
their mappings to different syntaxes are provably equivalent
by keeping them so small and simple that every aspect of the
mapping can be exhaustively tested.    (020)

The UML diagrams and other modeling languages are intended to
specify software systems.  We cannot tolerate any ambiguity or
looseness of any kind in those languages.  Their semantics must
be defined in terms of a logic whose foundation is as small and
compact as CL.    (021)

If you have such a tiny core semantics with very small mappings
to other notations, you can guarantee that all those notations
have identical semantics.  But if you start with a huge, loosely
defined kludge such as the XML-based definitions of RDF, you
cannot avoid ambiguities and loose ends, even if there is only
one specification document and one syntax.    (022)

__________________________________________________________________    (023)

I would point out that N3 has a simple mapping to CLIF.  Just take
any set of triples of the form A R B and move each R to the front:    (024)

    (and (R1 A1 B1) (R2 A2 B2) (R3 A3 B3) (R4 A4 B4))    (025)

In CGIF notation, you can even omit the "(and" :    (026)

    (R1 A1 B1) (R2 A2 B2) (R3 A3 B3) (R4 A4 B4)    (027)

RDF also permits "blank nodes", which, as Pat observed, could be
expressed by existential quantifiers.  For example, if A2, R3, and B4
are intended to be blanks, one could write    (028)

    (exists (A2 R3 B4)
       (and (R1 A1 B1) (R2 A2 B2) (R3 A3 B3) (R4 A4 B4)) )    (029)

Following is the equivalent in CGIF notation:    (030)

    (R1 A1 B1) (R2 [*A2] B2) ([*R3] A3 B3) (R4 A4 [*B4])    (031)

The subset of CL with just these features is a perfectly acceptable
dialect, and it has a direct mapping to N3 and to the semantically
significant core of RDF.    (032)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (033)

<Prev in Thread] Current Thread [Next in Thread>