ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Natural Language based SPARQL Generator

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Fri, 01 Feb 2013 09:49:56 -0500
Message-id: <510BD614.9070603@xxxxxxxxxxxxxx>
On 2/1/13 9:43 AM, Kingsley Idehen wrote:
> On 1/31/13 11:36 PM, John F Sowa wrote:
>> Kingsley and Doug,
>>
>> The Quepy developers use the NLTK toolkit, which is an open-source
>> set of Python-based software for NLP processing.  It's widely used
>> for teaching purposes.  But it is not state of the art NLP software.
>>
>> KI
>>> it's only using DBpedia whereas if it used the LOD cloud cache
>>> there would be a much broader knowledgebase.
>> Google answered every one of my five questions, but Quepy could
>> only answer one of them.  I also tried Bing, which did just
>> as well as Google on all five.
>
> But I am not presenting this as the ultimate question and answer 
> machine. Far from it. I am posing it as a showcase NLP aiding SPARQL 
> generation.
>
>>
>> In fact, Bing got a better answer for the question "When did
>> the Revolutionary War end?"  In addition to hits that were similar
>> to Google's, Bing gave the following answer above the list of hits:
>>
>> Bing
>
> See my comments above.
>
> Google, Bing, and others are silos. I am in the business of 
> silo-busting via Web architecture and open standards. I am most 
> interested in a global distributed database (offering equal billing to 
> extensional and intensional functionality) where hyperlinks are super 
> keys that resolve to entity relationship graphs endowed with machine 
> and human comprehensible entity relationship semantics.
>>> The American Revolutionary War began on Wednesday, April 19, 1775
>>> and ended on Wednesday, September 3, 1783.
>> DF
>>> I asked for the President of the UK, and since the SPARQL query
>>> was for a leader, not a president, the answers returned were
>>> David Cameron and Queen Elizabeth II.
>> I typed "Who is the president of the UK?" to Google and Bing.
>> Both of them found the following plus some other relevant hits:
>>
>> Bing and Google
>>>      Who is the president of the United Kingdom - The Q&A wiki
>>>      wiki.answers.com › … › United Kingdom › UK Politics
>>>
>>>      The United Kingdom is a parliamentary constitutional monarchy
>>>      and has no president. HM Queen Elizabeth II is Head of State.
>>>      The Right Honourable David Cameron MP is ...
>> Of course, Google and Microsoft (Bing) are multi-billion dollar
>> corporations with huge R & D budgets.  Quepy is OK for homework
>> exercises in a course on NLP.
>
> Yes, but you continue to present examples that really aren't aligned 
> to my core point. Again, Google, Bing etc.. are all silos. Being a 
> corporation doesn't mean they have to be data silo vectors. They will 
> ultimately be far more successful once they understand the virtues of 
> de-silo-fication, at Web-scale.
>
>>
>> DF
>>> It seems to generate SPARQL without using any ontology.
>> I read some of the Quepy documentation, which indicates that
>> they do recognize "classes" and "subclasses".  But Google,
>> Bing, and many other commercial companies have much richer
>> resources.
>
> Resource riches don't matter so much on the Web. Google didn't even 
> exist 20 years ago. Imagine if someone told a VC (circa. 1992) that 
> Google and others would emerge in the future, at the expense of 
> Microsoft? 99.99% of the time you would have been laughed out of the 
> room.
>
> The Web inflection is still very much in its infancy.
>>
>> KI
>>> To conclude, the key point I sought to make via this post is that
>>> natural language based SPARQL generation is an emerging frontier.
>> I doubt that.  Neither Google nor Bing use RDF, SPARQL, or OWL.
>> Instead, they do pattern matching directly to the raw, unannotated
>> natural language texts.
>
> They don't really matter as much as you assume. They aren't the 
> beacons of measurement in this realms.
>
> They only become interesting whenever they plug into the global Web 
> DBMS and start publishing hyperlink based super keys. Whenever the 
> come to that reality, SPARQL's utility will be crystal clear to them.
>>
>> I'll admit that there is a large and growing corpus of tagged
>> documents, for which RDF processing can be useful.  But the raw NL
>> documents are growing at a much faster rate than the tagging.
>
> The are increasingly being fused. There are many collaborations in the 
> Linked Data realm that leverage NLP services. I've been involved with 
> several (and counting) with lots of simple live demonstrations in hand 
> [1][2][3].
>
>>
>> KI
>>> that never happened on the SQL front, in any kind of webby way, of
>>> course, I would happily look at a live Web accessible SQL based system
>>> to see if it can match the most basic SPARQL functionality demonstrated
>>> by Quepy fronting SPARQL
>> SQL has a superset of the expressive power of RDF.  People had developed
>> very sophisticated NLP query systems for DB queries 30 years ago.  For
>> examples, see http://www.jfsowa.com/pubs/futures.pdf .  Most of those
>> systems never became profitable or they remained niche products.
>
> I want links to existing live systems based on SQL that can deliver 
> heterogeneous access to disparate Web accessible data sources via 
> hyperlink based super keys. Where are those systems? They don't exist 
> for a very good reason: they can't handle the nature of the Web:
>
> 1. Unpredicatable query request volume
> 2. Unpredicatable query request scope
> 3. Unpredictable query results sets navigation across entity 
> relationship dimensions via cursors (static, keyset, dynamic, or mixed)
> 4. Unpredictable attention span of users .
>
> There is a showdown point on the horizon that will ultimately bring 
> 1-4 in scope re. the likes of Google, Bing, and any other data silo 
> player.
>>
>> But some of them have been connected to speech systems for those
>> annoying automated telephone systems.  Replacing SQL with SPARQL
>> will do nothing to make them less annoying.
>
> Not my point. See my comments above. It's all about data 
> virtualization via hyperlink based super keys.
>>
>> As for webifying a version of SQL, that would be fairly easy to do.
>
> Where is it?
>
>> In Fact, Tim B-L included SQL as one of the languages that had to be
>> supported.  (See his DAML proposal of 2000.)  Oracle and IBM do that
>> with their products.  But the clueless academics who jumped on the
>> DAML bandwagon refused to support SQL.
>
> Somewhat inaccurate, they contributed to the development of SPARQL.
>
> SPARQL is SQL for the new Web-scale DBMS frontier. It's extremely 
> powerful and utterly useful :-)
>
>
> Kingsley
>    (01)

Links    (02)

1. http://bit.ly/Wx1v4n -- BBC article extracts using Spaziodatti NLP 
service.
2. http://bit.ly/T055Hj -- same thing using AlchemyAPI NLP service.
3. http://bit.ly/VrNCcY -- same thing using DBpedia Spotlight NLP service.    (03)

--     (04)

Regards,    (05)

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen    (06)

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>