ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Natural Language based SPARQL Generator

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Thu, 31 Jan 2013 09:28:06 -0500
Message-id: <510A7F76.7070603@xxxxxxxxxxxxxx>
On 1/30/13 11:16 PM, John F Sowa wrote:
> On 1/30/2013 1:00 PM, Kingsley Idehen wrote:
>> Quick FYI re: http://quepy.machinalis.com/ .
>>
>> Examples:
>>
>> 1. What is the capital of Nigeria? --http://bit.ly/WvXAIb.
>> 2. Who is the President of Nigeria? --http://bit.ly/XIsc5x.
>> 3. What is the population of Nigeria? --http://bit.ly/XQbAJM.
> I was *extremely* unimpressed by that system.    (01)

Remember, this is a bare bones system. Its open source [1] and 
extensible. For instance, it's only using DBpedia whereas if it used the 
LOD cloud cache there would be a much broader knowledgebase. In 
addition, it isn't making use of inference since that requires explicit 
triggering of the inference engine via SPARQL pragmas or the use of 
advanced SPARQL features such as property paths [2].    (02)

I posted the reference to this system to demonstrate the production of 
SPARQL from natural language constructs. This is the kind of endeavor a 
attempted in the past with SQL that never even matched the basic 
capability demonstrated by this system. Of course, I would happily look 
at any SQL based system that I can access online that refutes my claim 
by standing up to my basic testing.    (03)

I did not present this as an excellent answer service. It's about a tool 
for generating SPARQL .    (04)

>
> I typed each of those three sentences to Google and got the
> answer just from the brief excerpts quoted in the first
> few hits.    (05)

Please post the URLs of the Google responses.    (06)

Here is a Google URL for the question: Who is the president of Nigeria?
<http://www.google.com/search?client=safari&rls=en&q=who+is+the+president+of+nigeria&ie=UTF-8&oe=UTF-8>    (07)

The problem with the response is that its giving me a report (data 
contextualized by a document) that doesn't return the actual identifier 
that denotes the president of Nigeria. I can't use the output from 
Google in a program to construct or navigate a graph as part of a 
knowledge processing pipeline. That's the problem with Google's approach.    (08)

The same question posed to Wolfram Alpha gets the answer but once again 
without an identifier that denotes the president of Nigeria: 
<http://www.wolframalpha.com/input/?i=who+is+the+president+of+Nigeria> .    (09)

> I didn't even have to click on any of the URLs.
The URIs are the issue here. We want to query an HTTP accessible data 
space (comprised of entity relationship graphs) and have the ability to 
incorporate super keys (URIs) into the query result set. In addition, we 
want to be able to share query results and their definitions via 
hyperlinks (URIs).    (010)

>
> Then I typed the following sentences to both Quepy and Google.
> In each case, I typed the full sentence to Quepy & did a cut
> and paste of exactly the same string to Google.  Since Quepy
> would complain about capitalization, I was careful about that:
>
>    1. Who was Einstein?
>
>    2. What did Einstein do?
>
>    3. When did Lindbergh cross the Atlantic?
>
>    4. What is the chemical formula for acetone?
>
>    5. When did the Revolutionary War end?    (011)

See my comments above.    (012)

The generated query reads:    (013)

## query start ##
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX quepy: <http://www.machinalis.com/quepy#>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>    (014)

SELECT DISTINCT ?x1 WHERE {
   ?x0 rdf:type foaf:Person.
   ?x0 rdfs:label "Einstein"@en.
   ?x0 rdfs:comment ?x1.
}    (015)

##query end ##    (016)

That's just basic query generation which can be easily fixed via the 
toolset (which is openly available to anyone). Note, 'Albert Einstein' 
denote any entity, there's isn't a notability factor implicit in your 
Google searches i.e., you've already concluded which 'Albert Einstein' 
you seek information about.    (017)

SPARQL factoring disambiguation [3][4][5] would look something like the 
following:    (018)

## query start ##    (019)

  SELECT ?s1 AS ?c1,
         ( bif:search_excerpt ( bif:vector ( 'ALBERT', 'EINSTEIN' ) , 
?o1 ) ) AS ?c2,
         ?sc,
         ?rank,
         ?g
  WHERE
   {
       {
         SELECT ?s1,
         ( ?sc * 3e-1 ) AS ?sc,
         ?o1,
         ( sql:rnk_scale ( <LONG::IRI_RANK> ( ?s1 ) ) ) as ?rank,
         ?g
         WHERE
         {
           QUAD MAP virtrdf:DefaultQuadMap
           {
             GRAPH ?g
             {
               ?s1 ?s1textp ?o1 .
               ?o1 bif:contains ' ( ALBERT AND EINSTEIN ) ' option ( 
score ?sc ) .
               ?s1 a <http://schema.org/Person> .
               ?s1 <http://purl.org/dc/terms/subject> ?s2 .
               FILTER ( ?s2 = 
<http://dbpedia.org/resource/Category:Nobel_laureates_in_Physics> ) .    (020)

             }
            }
          }
        ORDER BY DESC ( ?sc * 3e-1 + sql:rnk_scale ( <LONG::IRI_RANK> ( 
?s1 ) ) )
        LIMIT 20 OFFSET 0
       }
    }    (021)

  ## query end ##    (022)

To conclude, the key point I sought to make via this post is that 
natural language based SPARQL generation is an emerging frontier. One 
that never happened on the SQL front, in any kind of webby way, of 
course, I would happily look at a live Web accessible SQL based system 
to see if it can match the most basic SPARQL functionality demonstrated 
by Quepy fronting SPARQL :-)    (023)


Links:    (024)

1. https://github.com/machinalis/quepy .
2. http://bit.ly/UydU9t -- example of inlined inference via SPARQL 1.1 
property paths functionality .
3. http://bit.ly/Vxtki9 -- 'Albert Einstein' disambiguated .
4. http://bit.ly/14zcUHB -- SPARQL Query Results Link .
5. http://bit.ly/11lH6kQ -- SPARQL Query Definition Link .    (025)


--     (026)

Regards,    (027)

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen    (028)

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>