[Top] [All Lists]

Re: [ontolog-forum] NLP2RDF

To: "[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>, "John F. Sowa" <sowa@xxxxxxxxxxx>
Cc: guha@xxxxxxxxxx
From: Pat Hayes <phayes@xxxxxxx>
Date: Sun, 4 Dec 2011 11:23:44 -0600
Message-id: <26897FAF-FC1B-415E-B11B-51A5CB10286F@xxxxxxx>
John, I put up with your half-informed rantings because I love you, but you 
really should be more careful when giving professional advice. See below.    (01)

On Dec 4, 2011, at 12:55 AM, John F. Sowa wrote:    (02)

> There was an announcement on Corpora List about a project to use
> RDF as an interchange format for natural language processing.
> See their web site:  http://nlp2rdf.org/
> I sent a response to Corpora List, saying that I thought it was
> a bad idea.  That generated one offline note saying that my response
> was "Brilliant!"  It also generated several that were more critical
> or at least noncommittal.
> In my reply go those comments (copy below), I mentioned Guha's talk
> and included a pointer to the web site.
> John
> -------- Original Message --------
> Subject: Re: [NLP2RDF] [Corpora-List] Announcement: NLP Interchange
> Format (NIF) 1.0 Spec, Demo and Reference Implementation
> Date: Sat, 03 Dec 2011 20:58:47 -0500
> From: John F. Sowa
> Dear Jens, Ewan, Matt, Leo, and Arnim,
> JL
>> Are you talking about RDF/XML syntax specifically? Otherwise,
>> you are comparing apples and oranges, since RDF can be serialised
>> in different formats like Turtle.
> Unfortunately, RDF is wrong in so many ways that it is hard to summarize
> them.  There is nothing wrong with having a readable human notation that
> compiles into an unreadable but efficient computer version.  But the
> RDF/XML notation is so bloated that it is horribly inefficient for
> computer processing, network transmission, and storage.    (03)

Many in the RDF world would agree. However, RDF is quite independent of 
RDF/XML. Much of the world's RDF is written using other notations, and the RDF 
standard was written using an 'abstract' (graph) syntax precisely to allow a 
variety of surface notations. Just like ISO Common Logic, in fact.     (04)

> At the semantic level, a serious flaw of RDF is the complete lack of
> typing.  There is no way to indicate that a URI is intended to represent
> a literal (the URI itself), the document identified by the URI, the
> content of that document, or the result of evaluating that content
> (if it happens to contain some executable or interpretable language).    (05)

This comment is uninformed. True, RDF itself is not syntactically typed, which 
is a property it shares with Common Logic, FOL and IKL and a variety of other 
notations. It does however have a notion of type: indeed, the first item in the 
RDF vocabulary is rdf:type. The 'ambiguity' of URI references is intrinsic to 
the very idea of any first-order notation: it is like saying that you don't 
'know' what kind of thing a logical name is intended to denote. Indeed, you 
don't, which is exactly why we write axioms (AKA ontologies) to help fix those 
intended referents.     (06)

Now, there is an issue, not for RDF but for *any* logical or representational 
formalism deployed on  the Web, which is how to relate the intended 
*denotation* of a name to what you get when you use that name in HTTP (or 
indeed in any transfer protocol, so I like to say, in XXTP). If a name is being 
used to denote, say, a human person, what should happen when you do an HTTP GET 
operation on it? What should be the relation between the document (actually, in 
Web-talk, a "representation of a resource") that you get back and your intended 
referent of this name? Or should you not get back anything, but instead receive 
a special HTTP code (eg a 303 redirect rather than a 200-level code...google 
"http-range-14" if you want to know more...) These are indeed tricky issues, 
and there is no single universally accepted answer or even widely accepted 
standard yet. It is probably too soon in the evolution of LOD to know what kind 
of solution is likely to survive in the real world. But note the following. 
First, this is not a trivial matter of 'typing'. It is a genuine new issue, one 
that no extant KR or logical formalism has even begun to tackle. It arises only 
when one tries to face up to the issues of deploying KR on the Internet HTTP 
architecture, at Web scales. So far, RDF is the **only** formalism for which 
this has even been attempted. Second, this issue arises (or will arise) for any 
KR formalism whatsoever. It is not RDF-specific: it happens because of the 
special nature of **names** on the Web. It would arise even if RDF had been a 
fully typed higher-order logic (or choose your own favorite KR notation.) It is 
orthogonal to questions of expressivity and human convenience. And third, it is 
new. Logicians, linguists and philosophers have nothing useful to say about 
this issue as they have never even considered it, or anything remotely like it. 
Solving this is a hard problem and we have to be the ones to do the necessary 
research and come up creatively with the needed ideas. Aristotle ain't going to 
help.    (07)

> JL
>> IBM Watson does use some background knowledge from the Web of Data (DBpedia).
> The IBM research group headed by Dave Ferrucci was aware of RDF, but
> they designed UIMA (Unstructured Information Management Architecture)
> as a more compact, readable, and efficient XML-based format.
> For Watson, they used a large volume of web resources, including some
> that may have been developed with RDF.  But to say that IBM actually
> used RDF in any essential way would be misleading.
> JL
>> Facebook has OpenGraph (http://ogp.me/).
> They do not use RDF.  They use RDFa, which is a notation for tagging
> HTML (or XML) documents.  But RDFa has nothing in common with RDF/XML
> other than the three letters R, D, and F.    (08)

They have RDF in common. RDF/XML is one interchange notation for RDF (which as 
I say above, is *defined* in terms of an abstract model called RDF graphs), and 
RDFa is one way (GRDDL is another) to integrate RDF content 'invisibly' into 
XHTML. It is exactly the same RDF in both cases, and can be processed by any 
conforming RDF engine or tool-kit. One can process RDF from both sources 
together uniformly and even translate between them.     (09)

>  Facebook, like nearly
> everybody who uses RDFa tags, translates the data from those tags
> to a more efficient notation than RDF -- JSON, for example.    (010)

JSON is yet another way to write RDF syntax. (In fact it is several such ways.) 
But it is still RDF being written.    (011)

> Even the W3C documents show that the translation to JSON is simpler,
> more compact, and more efficient than the translation to RDF/XML.
> Look at their document http://dev.w3.org/html5/md-LC/ and compare
> Section 5.1 (translation to JSON) to Section 5.2 (translation to RDF).    (012)

Sure, no argument. But you must not identity RDF with RDF/XML.     (013)

> JL
>> Google Shopping uses it (http://purl.org/goodrelations/).
> GoodRelations is an ontology that happens to be expressed in OWL.
> But if you look at the actual OWL statements, you'll notice that
> they don't use any features of OWL that could not be expressed
> in Aristotle's original syllogisms.  In fact, the overwhelming
> majority of sites that claim to use OWL don't go beyond Aristotle.    (014)

But Aristotle, unlike OWL, has not been implemented or standardized on the Web.     (015)

> Furthermore, Google is one of the founding members of schema.org,
> which has developed their own vocabulary and methods of processing.
> See their hierarchy of terms:  http://schema.org/docs/full.html
> Look at the way they use those terms:  http://schema.org/docs/gs.html
> You won't see any RDF or OWL there.    (016)

True, because OWL would be too heavy a buy-in for this kind of application. You 
will see hardly any OWL in the hundreds of billions of RDF linked data triples 
either. What has this got to do with RDF?    (017)

> EK
>> I'm not quite sure what you mean by "expressing triples" -- is it
>> the URIs that you have problems with?
> That's a separate issue.  I just meant that the information expressed
> in RDF/XML can be stated more simply, readably, and efficiently in
> many other notations, ranging from LISP to JSON.    (018)

As soon as you define a 'style' for doing that representation and publish it, 
you have simply defined another surface syntax for RDF. If you just say 'use 
JSON' or "use LISP', nothing will work properly, as there are many, many ways 
to encode RDF triples in languages like this. So choose (or invent) one, and 
help make the RDF world that little but richer. FWIW, the current RDF WG is 
discussing ways of encoding RDF in JSON as one of its top-priority items. BTW, 
RDF is older than JSON, which is why this wasnt done in the first RDF standard.     (019)

> EK
>> RDF also provides a foundation for OWL, which is increasingly used for 
> See the above point about Aristotle.  And see the remarks below by
> R. V. Guha, who worked with Tim Bray to define RDF.  Guha now works
> at Google, where he is one of the chief proponents of schema.org.
> As for Tim Bray, he apologized for the mistakes in RDF.  As Tim said,
> "It's the syntax, stupid."  See his web site:
>    http://www.tbray.org/ongoing/When/200x/2003/05/21/RDFNet
> EK
>> the growth of the LOD cloud suggests that there is a lot of mileage
>> in the linking part of Linked Data.
> Linked Open Data (LOD) began with the WWW twenty years ago.    (020)

Nonsense. Find me any reference to the phrase which predates the widespread 
employment of RDF.    (021)

>  The RDF+OWL
> method of doing semantics    (022)

Um...which one? OWL has evolved quite separately from RDF, and is on a 
different (and much more 'traditional') semantic arc, unlike RDF which is based 
firmly on the Common Logic semantic approach.     (023)

> "never caught on." (That is a quotation from
> the talk by Guha, cited below.)  Guha also noted that the adoption rate
> of schema.org is much faster than RDF and OWL.  In terms of web pages,
> it already dwarfs the use of RDF/XML.    (024)

Counting webpages is misleading. There are liked data 'pages' which contain 
millions of RDF triples.     (025)

> MP
>> But RDF itself is just the underlying subj-pred-obj triples model
> As I said, that model can be expressed more easily in LISP or JSON.
> For a linguist, calling those triples a "subj-pred-obj model" is so
> hopelessly naive that there is no way they could take it seriously.
> Leo
>> R.V. Guha of Google (talking about schema.org at the Ontolog Forum
>> yesterday: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2011_12_01)
>> and Dan Brickley said that originally in the late 1990s RDF had
>> an s-expression-like syntax: http://www.w3.org/TR/NOTE-pics-ng-metadata
>> but that "then XML happened."
> Yes.  In response to my question about using a LISP-like syntax, Guha
> said "I wish we could have done that."    (026)

He isnt alone. I wished that too, and so did Ora Lassila, among many others. 
But there are social/political (small 'p') issues to be handled. Just firing 
off passionate emails to a relatively tiny technical audience isnt going to get 
anything adopted. You have to actually work with many people who just as 
passionately don't agree with you or your basic assumptions, or who have 
passions of their own which are completely orthogonal to yours, or who have 
invested heavily in other technologies and want to have minimal changes made, 
and reach a workable consensus with them. It is a slow, frustrating pushiness 
and the result is a horse designed by a committee, I will agree. But at least 
is out there being used. There is absolutely no way in hell that any proposal 
to use LISP or (say) IKL, or indeed conceptual graphs, is going to get adopted 
by more than a vanishingly small fraction of the world's users.    (027)

>  For anybody who might still be
> interested in this topic, I strongly recommend Guha's talk about
> schema.org and the discussion period, which addressed many related
> issues.
> AB
>> Conceptual Graphs have never really made it either.
>> Some don't know what KIF stands for but see no prb in
>> foaf:currentProject & monotonic RDF.
>> So.. lets call it research
> That is true of every notation for NLP semantics.  WordNet is
> probably the most widely used NLP resource, but they don't claim
> that their notation is suitable as an interchange format for NLP.
> The FOAF work is more popular because it is at a very low level
> that does not require any knowledge of logic, ontology, or
> linguistics.  That is also why the usage of schema.org is
> growing rapidly:  it doesn't use scary words like 'logic'
> or 'ontology' that frighten the unwashed masses.    (028)

Quite. schema.org is *way* more primitive than even RDF. I don't know why you 
feel that its success (which is of course due to commercial pressure rather 
than technical merit) is something to be celebrated. Compared to RDF it is a 
huge step backwards. I remain confident that it will eventually be absorbed 
into RDF as a centrally important vocabulary/namespace.     (029)

Pat    (030)

> John
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cg-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: cg-help@xxxxxxxxxxxxxxxxxxxx
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/ 
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>     (031)

IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes    (032)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (033)

<Prev in Thread] Current Thread [Next in Thread>