[Top] [All Lists]

[ontolog-forum] NLP2RDF

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
Cc: guha@xxxxxxxxxx
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Sun, 04 Dec 2011 01:55:25 -0500
Message-id: <4EDB195D.60602@xxxxxxxxxxx>
There was an announcement on Corpora List about a project to use
RDF as an interchange format for natural language processing.
See their web site:  http://nlp2rdf.org/    (01)

I sent a response to Corpora List, saying that I thought it was
a bad idea.  That generated one offline note saying that my response
was "Brilliant!"  It also generated several that were more critical
or at least noncommittal.    (02)

In my reply go those comments (copy below), I mentioned Guha's talk
and included a pointer to the web site.    (03)

John    (04)

-------- Original Message --------
Subject: Re: [NLP2RDF] [Corpora-List] Announcement: NLP Interchange
Format (NIF) 1.0 Spec, Demo and Reference Implementation
Date: Sat, 03 Dec 2011 20:58:47 -0500
From: John F. Sowa    (05)

Dear Jens, Ewan, Matt, Leo, and Arnim,    (06)

> Are you talking about RDF/XML syntax specifically? Otherwise,
> you are comparing apples and oranges, since RDF can be serialised
> in different formats like Turtle.    (07)

Unfortunately, RDF is wrong in so many ways that it is hard to summarize
them.  There is nothing wrong with having a readable human notation that
compiles into an unreadable but efficient computer version.  But the
RDF/XML notation is so bloated that it is horribly inefficient for
computer processing, network transmission, and storage.    (08)

At the semantic level, a serious flaw of RDF is the complete lack of
typing.  There is no way to indicate that a URI is intended to represent
a literal (the URI itself), the document identified by the URI, the
content of that document, or the result of evaluating that content
(if it happens to contain some executable or interpretable language).    (09)

> IBM Watson does use some background knowledge from the Web of Data (DBpedia).    (010)

The IBM research group headed by Dave Ferrucci was aware of RDF, but
they designed UIMA (Unstructured Information Management Architecture)
as a more compact, readable, and efficient XML-based format.    (011)

For Watson, they used a large volume of web resources, including some
that may have been developed with RDF.  But to say that IBM actually
used RDF in any essential way would be misleading.    (012)

> Facebook has OpenGraph (http://ogp.me/).    (013)

They do not use RDF.  They use RDFa, which is a notation for tagging
HTML (or XML) documents.  But RDFa has nothing in common with RDF/XML
other than the three letters R, D, and F.  Facebook, like nearly
everybody who uses RDFa tags, translates the data from those tags
to a more efficient notation than RDF -- JSON, for example.    (014)

Even the W3C documents show that the translation to JSON is simpler,
more compact, and more efficient than the translation to RDF/XML.
Look at their document http://dev.w3.org/html5/md-LC/ and compare
Section 5.1 (translation to JSON) to Section 5.2 (translation to RDF).    (015)

> Google Shopping uses it (http://purl.org/goodrelations/).    (016)

GoodRelations is an ontology that happens to be expressed in OWL.
But if you look at the actual OWL statements, you'll notice that
they don't use any features of OWL that could not be expressed
in Aristotle's original syllogisms.  In fact, the overwhelming
majority of sites that claim to use OWL don't go beyond Aristotle.    (017)

Furthermore, Google is one of the founding members of schema.org,
which has developed their own vocabulary and methods of processing.
See their hierarchy of terms:  http://schema.org/docs/full.html    (018)

Look at the way they use those terms:  http://schema.org/docs/gs.html
You won't see any RDF or OWL there.    (019)

> I'm not quite sure what you mean by "expressing triples" -- is it
> the URIs that you have problems with?    (020)

That's a separate issue.  I just meant that the information expressed
in RDF/XML can be stated more simply, readably, and efficiently in
many other notations, ranging from LISP to JSON.    (021)

> RDF also provides a foundation for OWL, which is increasingly used for 
>ontologies    (022)

See the above point about Aristotle.  And see the remarks below by
R. V. Guha, who worked with Tim Bray to define RDF.  Guha now works
at Google, where he is one of the chief proponents of schema.org.    (023)

As for Tim Bray, he apologized for the mistakes in RDF.  As Tim said,
"It's the syntax, stupid."  See his web site:    (024)

    http://www.tbray.org/ongoing/When/200x/2003/05/21/RDFNet    (025)

> the growth of the LOD cloud suggests that there is a lot of mileage
> in the linking part of Linked Data.    (026)

Linked Open Data (LOD) began with the WWW twenty years ago.  The RDF+OWL
method of doing semantics "never caught on." (That is a quotation from
the talk by Guha, cited below.)  Guha also noted that the adoption rate
of schema.org is much faster than RDF and OWL.  In terms of web pages,
it already dwarfs the use of RDF/XML.    (027)

> But RDF itself is just the underlying subj-pred-obj triples model    (028)

As I said, that model can be expressed more easily in LISP or JSON.
For a linguist, calling those triples a "subj-pred-obj model" is so
hopelessly naive that there is no way they could take it seriously.    (029)

> R.V. Guha of Google (talking about schema.org at the Ontolog Forum
> yesterday: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2011_12_01)
> and Dan Brickley said that originally in the late 1990s RDF had
> an s-expression-like syntax: http://www.w3.org/TR/NOTE-pics-ng-metadata
> but that "then XML happened."    (030)

Yes.  In response to my question about using a LISP-like syntax, Guha
said "I wish we could have done that."  For anybody who might still be
interested in this topic, I strongly recommend Guha's talk about
schema.org and the discussion period, which addressed many related
issues.    (031)

> Conceptual Graphs have never really made it either.
> Some don't know what KIF stands for but see no prb in
> foaf:currentProject & monotonic RDF.
> So.. lets call it research    (032)

That is true of every notation for NLP semantics.  WordNet is
probably the most widely used NLP resource, but they don't claim
that their notation is suitable as an interchange format for NLP.    (033)

The FOAF work is more popular because it is at a very low level
that does not require any knowledge of logic, ontology, or
linguistics.  That is also why the usage of schema.org is
growing rapidly:  it doesn't use scary words like 'logic'
or 'ontology' that frighten the unwashed masses.    (034)

John    (035)

To unsubscribe, e-mail: cg-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: cg-help@xxxxxxxxxxxxxxxxxxxx    (036)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (037)

<Prev in Thread] Current Thread [Next in Thread>