ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] NLP2RDF

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Mon, 05 Dec 2011 09:06:59 -0500
Message-id: <4EDCD003.9030002@xxxxxxxxxxxxxx>
On 12/4/11 6:52 PM, John F. Sowa wrote:
> Pat,
>
> I agree with the technical points in your notes.
>
> As Dan Brickley pointed out, I had written strongly favorable
> emails about the potential of the Semantic Web back in 1998 when
> it was getting off the ground.  But more than a dozen years have
> passed, and the hopes we had for the SW have not been realized.
> It's time to analyze what happened, what could have been done
> better, and what should be done now.
>
> I was traveling last week and didn't have a chance to call in
> for Guha's talk on Thursday.  But I did listen to the audio
> on Saturday morning before I wrote my note to Corpora List
> about NLP2RDF.  I think that talk is relevant to the issues,
> and I recommend it for anybody who had not heard it:
>
> 
>http://ontolog.cim3.net/file/resource/presentation/Schema.org--RVGuha_20111201/Schema.org_RVGuha_20111201b.mp3
>
> As a brief summary or reminder,
>
>    1. Guha discussed the schema.org project, which was founded as
>       a joint effort by Google, Bing (Microsoft), and Yahoo!
>       For more info, see http://schema.org/docs/faq.html
>
>    2. He is now working at Google on that project (but he didn't
>       give any confidential info about what they're doing).
>
>    3. In the discussion period, he did make some brief comments
>       about RDF.  A quotation:
>
>       "Somehow, RDF never caught on... At least RDFa is here to stay."
>
>    4. Since I wasn't able to call in for the talk, I wrote a question
>       about why Guha hadn't used LISP notation for triples, and
>       Steve Ray read it.  Guha's answer:  "I wish we could have
>       done that."  But the powers that be insisted on XML.    (01)

Yes, but this is still a comment about markup syntax.    (02)

RDF/XML obscured the triple, that was/is still its fundamental shortcoming.    (03)

Conflating RDF/XML and the RDF (which is EAV/SPO triples that include 
URIs re. E/S, A/P, V/O) in those early years didn't help matter either.    (04)

All of this is what I mean by the genealogical flaw in RDF's narrative 
(this isn't something Pat is guilty of btw., but many other RDF 
supporters are).
>
>    5. The adoption rate of the schema.org vocabulary and notation
>       has been very fast -- much faster than RDF and even faster
>       than Google had expected.    (05)

Yes, but here you are refusing to separate syntax from base graph model. 
Basically, this is what I mean by RDF's data model claim not being 
generally accepted. Especially as it's generally seen as a tactical 
pivot from the first coming which was all about RDF/XML first and the 
underlying directed graph based data model second, at least narrative wise.    (06)

Schema.org uses Microdata syntax for directed graph model based data 
islands embedded in HTML documents. Microdata has the advantage of 
making EAV/SPO triples visible without RDF syntax tax. It also doesn't 
introduce mime type problems re. text/html unlike RDFa which attempts to 
negate a deeper mime type war (xhtml vs html)  via a specific DOCTYPE 
declaration that is very problematic re. Linked Data. Basically, 
developers that already familiar with HTML parsing find Microdata quite 
natural.    (07)

Schema.org org also benefits from the might and influence of Google. All 
in all this is all good.    (08)

Over the weekend I posted a simple example [1] of how cross linking 
DBpedia and Schema.org actually delivers massive network effects for the 
burgeoning Web of Linked Data.    (09)

>
>    6. A primary reason for the rapid adoption is that the schema.org
>       vocabulary and notation is easy for Webmasters to learn and use.    (010)

As per comments above, plus a very pragmatic dimension in the form of a 
simple value proposition: put this markup in your pages and your pages 
becomes more visible to Google, Bing, and Yahoo! Basically, the 
beginning of the end of black hat SEO.    (011)

>
> Since Guha had been the original designer of RDF (with Tim Bray as
> the XML expert on the project), that is not a ringing endorsement.
> Schema.org is not using RDF, although RDFa can be used in conjunction
> with it.    (012)

Schema.org is using EAV + URIs as its directed graph based data model. 
Like RDF, it doesn't offer specifics about the de-reference behavior of 
URIs. Unlike RDF, it isn't closely aligned with RDFS and OWL with 
regards to vocabularies that address relation semantics .    (013)

> But note the following from the schema.org FAQ page:
>
>> RDFa is extensible and very expressive, but the substantial
>> complexity of the language has contributed to slower adoption.    (014)

+1    (015)

Prefixes are a strange premature optimization that most developers 
(outside the Semantic Web community) simply don't care about. They have 
no problem working with long URLs or URIs as global identifiers.    (016)

>> Microdata is the most recent well-known standard, created along
>> with HTML5. It strikes a balance between extensibility and simplicity,
>> and is most suitable for building the schema.org.    (017)

+1    (018)

> Some comments on your notes:
>
> PH
>> Seems to me that RDF has (a whole host of tedious small problems)
>> but no really big, central problem.
> The biggest problem is the poor "bang for the buck".  RDF/XML is
> horribly inefficient for its functionality.  The best thing to do
> is to declare RDF/XML as "functionally stabilized" -- that's IBM's
> euphemism for "Obsolete, but we still have to support it for a while."    (019)

Again, the RDF syntax and model separation isn't reflected in your 
comments. I understand where you're coming from, but the distinction 
(however dubious) should at least be acknowledged. RDF the model is 
distinct from RDF/XML syntax.    (020)

RDF/XML syntax is being given the very treatment you suggest. Pat 
(wink!) understands this slight of hand, he is with you on this one 
since he cares much more about the model than any particular syntax.    (021)

>
> As Guha said, "RDFa is here to stay."  But other notations are used
> for the computable form.  In JSON, for example, a triple is written
>
>      [A, B, C]
>
> And a typed triple (or N-tuple) can be written
>
>      {Type1:A, Type2:B, Type3:C}    (022)

Yes, this very point veers back to my central position re. RDF problems. 
RDF did not invent the triple or 3-tuple mechanism for representing 
directed graphs. Thus, genealogy re. 3-tuples (triples) needs some 
acknowledgement in RDF model oriented narratives. Simply inferring that 
(even inadvertently) that RDF is the 3-tuple progenitor is wrong. It's 
so wrong that you and Pat are sorta talking past one another. That's how 
bad the genealogy challenged narrative of RDF has become.    (023)

As I've said many times in the past, RDF shouldn't make sole claims 
i.e., infer being the sole option for:    (024)

1. directed graph based data models the underlying markup languages for 
structured data representation and across-the-wire serialization
2. Linked Data resource construction based on specific use structured 
data representation markup and across-the-wire serialization formats.    (025)

RDF (the model) and its associated syntaxes do have clearly 
distinguishable merit in the the following areas:    (026)

1. typed and untyped literals semantics
2. i18n .    (027)

I don't regard RDFS and OWL as being bound to any RDF syntax. Of course, 
they leverage the same underlying directed graph based model plus the 
incorporation of URIs.    (028)

>> The 'ambiguity' of URI references is intrinsic to the very idea of
>> any first-order notation: it is like saying that you don't 'know'
>> what kind of thing a logical name is intended to denote. Indeed,
>> you don't, which is exactly why we write axioms (AKA ontologies)
>> to help fix those intended referents.
> I have no quarrel about using an untyped model theory to define
> an untyped base language.  But both CLIF and CGIF have an extended
> syntax that restricts the range of a quantifier to a specific type,
> determined by a monadic relation named in the quantifier field.
>
> With RDF, a typed triple (such as the JSON example above) would
> expand to four untyped RDF triples.  That expansion is OK in
> a formal specification, but you can't tolerate that bloat at
> run time.   RDF/XML is already too inefficient.    (029)

Yes, but you can also do it in JSON
> [SNIP]    (030)

> I agree that schema.org is more primitive, but I believe that the
> "commercial pressure" is more meaningful than the W3C's "political
> pressure".    (031)

+1    (032)

>   Webmasters are adopting it because they find it easy
> to learn and easy to use for what they need now. Sooner or later,
> they'll need more.    (033)

+1    (034)

>
>> I remain confident that it will eventually be absorbed into RDF
>> as a centrally important vocabulary/namespace.
> I'm sure that schema.org will evolve into something very different,
> but RDF has nothing to offer a webmaster.    (035)

Yes, if RDF != EAV + URIs :-)    (036)

>
> My recommendation:  Use CL to define the semantics of the JSON
> notation and bypass RDF.  Then develop some tools to work with it.
> One example would be controlled English (at the level of Peter's
> version of syllogisms) to define type hierarchies -- that is the
> most useful subset of OWL.  Then develop some rule-based tools
> for more complex reasoning.  That combination -- JSON + rules +
> controlled English -- would be a simple, easy-to-use replacement
> for RDF + whatever version of OWL anybody actually uses.    (037)

+1 assuming we just can't ever find a way for the RDF narrative to 
include better genealogy by loosening its sole claim to EAV/SPO  based 
directed graphs that include URIs.    (038)

>
> If the W3C wants to claim that combination for the SW, that's fine
> with me.  But the "decidability thought police" would probably shoot
> it down for the SW.  That would just mean that the SW wouldn't get
> the credit.  That's their loss.
>
> John
>
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>
>
Links (in both cases note the page footers and <head/> should you view 
source):    (039)

1. 
http://linkeddata.uriburner.com/describe/?url=http://schema.org/Mountain 
-- simple example showing the benefits (network effects) of cross 
linking DBpedia and Schema.org    (040)

2. http://lod.openlinksw.com/describe/?url=http://schema.org/Mountain -- 
ditto but note the addition number of pages relative to #1 .    (041)

--     (042)

Regards,    (043)

Kingsley Idehen 
Founder&  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen    (044)

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>