ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] NLP2RDF

To: Pat Hayes <phayes@xxxxxxx>
Cc: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Sun, 04 Dec 2011 18:52:46 -0500
Message-id: <4EDC07CE.6010107@xxxxxxxxxxx>
Pat,    (01)

I agree with the technical points in your notes.    (02)

As Dan Brickley pointed out, I had written strongly favorable
emails about the potential of the Semantic Web back in 1998 when
it was getting off the ground.  But more than a dozen years have
passed, and the hopes we had for the SW have not been realized.
It's time to analyze what happened, what could have been done
better, and what should be done now.    (03)

I was traveling last week and didn't have a chance to call in
for Guha's talk on Thursday.  But I did listen to the audio
on Saturday morning before I wrote my note to Corpora List
about NLP2RDF.  I think that talk is relevant to the issues,
and I recommend it for anybody who had not heard it:    (04)

http://ontolog.cim3.net/file/resource/presentation/Schema.org--RVGuha_20111201/Schema.org_RVGuha_20111201b.mp3    (05)

As a brief summary or reminder,    (06)

  1. Guha discussed the schema.org project, which was founded as
     a joint effort by Google, Bing (Microsoft), and Yahoo!
     For more info, see http://schema.org/docs/faq.html    (07)

  2. He is now working at Google on that project (but he didn't
     give any confidential info about what they're doing).    (08)

  3. In the discussion period, he did make some brief comments
     about RDF.  A quotation:    (09)

     "Somehow, RDF never caught on... At least RDFa is here to stay."    (010)

  4. Since I wasn't able to call in for the talk, I wrote a question
     about why Guha hadn't used LISP notation for triples, and
     Steve Ray read it.  Guha's answer:  "I wish we could have
     done that."  But the powers that be insisted on XML.    (011)

  5. The adoption rate of the schema.org vocabulary and notation
     has been very fast -- much faster than RDF and even faster
     than Google had expected.    (012)

  6. A primary reason for the rapid adoption is that the schema.org
     vocabulary and notation is easy for Webmasters to learn and use.    (013)

Since Guha had been the original designer of RDF (with Tim Bray as
the XML expert on the project), that is not a ringing endorsement.
Schema.org is not using RDF, although RDFa can be used in conjunction
with it.  But note the following from the schema.org FAQ page:    (014)

> RDFa is extensible and very expressive, but the substantial
> complexity of the language has contributed to slower adoption.
> Microdata is the most recent well-known standard, created along
> with HTML5. It strikes a balance between extensibility and simplicity,
> and is most suitable for building the schema.org.    (015)

Some comments on your notes:    (016)

PH
> Seems to me that RDF has (a whole host of tedious small problems)
> but no really big, central problem.    (017)

The biggest problem is the poor "bang for the buck".  RDF/XML is
horribly inefficient for its functionality.  The best thing to do
is to declare RDF/XML as "functionally stabilized" -- that's IBM's
euphemism for "Obsolete, but we still have to support it for a while."    (018)

As Guha said, "RDFa is here to stay."  But other notations are used
for the computable form.  In JSON, for example, a triple is written    (019)

    [A, B, C]    (020)

And a typed triple (or N-tuple) can be written    (021)

    {Type1:A, Type2:B, Type3:C}    (022)

> The 'ambiguity' of URI references is intrinsic to the very idea of
> any first-order notation: it is like saying that you don't 'know'
> what kind of thing a logical name is intended to denote. Indeed,
> you don't, which is exactly why we write axioms (AKA ontologies)
> to help fix those intended referents.    (023)

I have no quarrel about using an untyped model theory to define
an untyped base language.  But both CLIF and CGIF have an extended
syntax that restricts the range of a quantifier to a specific type,
determined by a monadic relation named in the quantifier field.    (024)

With RDF, a typed triple (such as the JSON example above) would
expand to four untyped RDF triples.  That expansion is OK in
a formal specification, but you can't tolerate that bloat at
run time.   RDF/XML is already too inefficient.    (025)

> how to relate the intended *denotation* of a name to what you
> get when you use that name in HTTP (or indeed in any transfer
> protocol, so I like to say, in XXTP)...
> It arises only when one tries to face up to the issues of
> deploying KR on the Internet HTTP architecture, at Web scales.    (026)

First, I would separate the issue of KR from the issue of HTTP
access.  The KR is a description of relationships.  HTTP is an
access method that an application would use, and it should make
the decision about how or whether to follow the links.    (027)

> Logicians, linguists and philosophers have nothing useful to
> say about this issue as they have never even considered it,
> or anything remotely like it.    (028)

Or course not.  They deal with the KR issues, not with the
computational issues.  The question of "self-describing data"
and what to do with it had been thoroughly analyzed since the
1960s in terms of both hardware and software implementations:    (029)

  1. LISP had a property list with every variable.    (030)

  2. Simula 67 (the first object-oriented language, which
     influenced all the others) had self-describing data in 1967
     and the descriptor determined how an object would be processed
     when invoked.    (031)

  3. Magnetic tapes routinely had a header field that described
     the file formats and contents.    (032)

  4. The Burroughs B5000, introduced in 1961, had a hardware
     descriptor for each data item in core storage (RAM).    (033)

  5. The Very Large Data Base (VLDB) conferences, which have been
     held annually since 1976, have gone into excruciating detail
     about such issues.  The weaknesses of the SQL implementation
     (Oracle's fault for implementing an early spec from IBM's
     Journal of R & D) were recognized since the late 1970s.
     Many excellent proposals were published over the years,
     but there were two major obstacles:  Oracle and IBM.    (034)

If you have a language with typed (or restricted) variables and data
with descriptors of their content, the size or amount of your data
is irrelevant.  When you access something, you check the types.
Of course, you can't always depend on the presence or accuracy
of a type descriptor with all data.  If missing, you treat the
data as untyped and take some default action.  Eventually,
anybody who wants performance will add appropriate descriptors.
If not, they get the default.  If they don't like it, too bad.    (035)

> JSON is yet another way to write RDF syntax. (In fact it is several
> such ways.) But it is still RDF being written.    (036)

Yes, but Guha designed RDF based on his experience in using LISP
to implement CycL.  The only credit you can give the W3C is to
promote a teeny-tiny version of CycL with a lousy notation.    (037)

> But Aristotle, unlike OWL, has not been implemented or standardized on the 
>Web.    (038)

The Scholastic version was standardized by Peter of Spain in 1239.
Peter's textbook went though many editions over the centuries, and
parts of it were copied in every textbook on logic from the 13th
to the 19th centuries.  Furthermore, Peter had more authority than
the W3C -- he became Pope John XXI in 1276.    (039)

> As soon as you define a 'style' for doing that representation and
> publish it, you have simply defined another surface syntax for RDF.
> If you just say 'use JSON' or "use LISP', nothing will work properly,
> as there are many, many ways to encode RDF triples in languages like
> this. So choose (or invent) one...    (040)

The one I would choose is the model theory for Common Logic, which,
thanks to you, happens to be upward compatible with LBase for RDF.    (041)

> BTW, RDF is older than JSON, which is why this wasnt done in
> the first RDF standard.    (042)

But JSON, as the name indicates, is JavaScript Object Notation.
JavaScript was introduced in Netscape Navigator in 1995.  Since
Netscape employed both Guha and Tim Bray, they could have chosen
JSON in 1997.  In any case, Guha said that he would have preferred
LISP notation to XML.    (043)

> He isnt alone. I wished that too, and so did Ora Lassila, among
> many others. But there are social/political (small 'p') issues
> to be handled. Just firing off passionate emails to a relatively
> tiny technical audience isnt going to get anything adopted.    (044)

I don't have to persuade anybody. RDF/XML is dying of it's own weight.
My note to Corpora List was just a warning to that community to avoid
adopting dead-end technology for a new project.  If they do adopt it,
it would become their problem, not mine.    (045)

> I don't know why you feel that its [schema.org's] success (which is
> of course due  to commercial pressure rather than technical merit) is
> something  to be celebrated. Compared to RDF it is a huge step backwards.    (046)

I agree that schema.org is more primitive, but I believe that the
"commercial pressure" is more meaningful than the W3C's "political
pressure".  Webmasters are adopting it because they find it easy
to learn and easy to use for what they need now. Sooner or later,
they'll need more.    (047)

> I remain confident that it will eventually be absorbed into RDF
> as a centrally important vocabulary/namespace.    (048)

I'm sure that schema.org will evolve into something very different,
but RDF has nothing to offer a webmaster.    (049)

My recommendation:  Use CL to define the semantics of the JSON
notation and bypass RDF.  Then develop some tools to work with it.
One example would be controlled English (at the level of Peter's
version of syllogisms) to define type hierarchies -- that is the
most useful subset of OWL.  Then develop some rule-based tools
for more complex reasoning.  That combination -- JSON + rules +
controlled English -- would be a simple, easy-to-use replacement
for RDF + whatever version of OWL anybody actually uses.    (050)

If the W3C wants to claim that combination for the SW, that's fine
with me.  But the "decidability thought police" would probably shoot
it down for the SW.  That would just mean that the SW wouldn't get
the credit.  That's their loss.    (051)

John    (052)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (053)

<Prev in Thread] Current Thread [Next in Thread>