ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] IBM Watson on Jeopardy

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: Krzysztof Janowicz <jano@xxxxxxx>
Date: Thu, 10 Feb 2011 12:23:40 -0500
Message-id: <4D541F1C.1020004@xxxxxxx>
John,    (01)

I agree with many interesting observations in your posting but I don't 
think that they contradict with Semantic Web or Linked Data research. It 
is rather the role of ontology and how we plan to develop them that 
needs a reality check.    (02)

The domain of geospatial semantics is a nice study area for this. There 
is no canonical or context free definition of terms such as river, lake, 
or forest. Hence, there will be no meaningful upper-level ontology of 
such geographic feature types (and we tried for more than 20 years). To 
start even one step earlier there are no rivers, lakes, or forests in 
the physical world. Their conceptualization is based on cognition and 
especially social convention. All these types can only be defined in a 
local context (microsenses; to use your terminology), i.e., taking space 
and time into account. If ontologies, however, should be restricted to 
modeling the 'real world' in a top-down manner then they will most 
likely not contribute to projects such as Watson.    (03)

Meaning only emerges within a context by situated simulation (to use 
Barsalou's terminology). I believe that with a growing number of facts 
in the Linked Data cloud the semantic integration problem will become 
much more obvious and work on semantic translation, alignment, 
induction, similarity, etc will become more prominent. Projects such as 
DBpedia do a great job in extracting data chunks out of document-driven 
systems such as Wikipedia. However, while extracting data and linking 
them fosters their on-the-fly combination with other data and supports 
their re-usability, it puts the burden on their interpretation. The 
original documents provided the creation context which is necessary to 
determine which microsense has to be used to understand a specific term. 
In this sense, Linked Data de-contextualizes data. I am very curious to 
see which role the 'ontological layer' will play in the future of Linked 
Data and hope these ontologies will be derived (and personalized) our of 
real data, i.e., bottom-up.    (04)

Best,
Krzysztof    (05)





On 02/10/2011 11:27 AM, John F. Sowa wrote:
> Peter,
>
> Thanks for the reminder:
>
>> Dave Ferrucci gave a talk on UIMA (the Unstructured Information
>> Management Architecture) back in May-2006, entitled: "Putting the
>> Semantics in the Semantic Web: An overview of UIMA and its role in
>> Accelerating the Semantic Revolution"
> I recommend that readers compare Ferrucci's talk about UIMA in
> 2006 with his talk about the Watson system and Jeopardy in 2011.
> In less than 5 years, they built Watson on the UIMA foundation,
> which contained a reasonable amount of NLP tools, a modest ontology,
> and some useful tools for knowledge acquisition.  During that time,
> they added quite a bit of machine learning, reasoning, statistics,
> and heuristics.  But most of all, they added terabytes of documents.
>
> For the record, following are Ferrucci's slides from 2006:
>
> 
>http://ontolog.cim3.net/file/resource/presentation/DavidFerrucci_20060511/UIMA-SemanticWeb--DavidFerrucci_20060511.pdf
>
> Following is the talk that explains the slides:
>
> 
>http://ontolog.cim3.net/file/resource/presentation/DavidFerrucci_20060511/UIMA-SemanticWeb--DavidFerrucci_20060511_Recording-2914992-460237.mp3
>
> And following is his recent talk about the DeepQA project for
> building and extending that foundation for Jeopardy:
>
> 
>http://www-943.ibm.com/innovation/us/watson/watson-for-a-smarter-planet/building-a-jeopardy-champion/how-watson-works.html
>
> Compared to Ferrucci's talks, the PBS Nova program was a disappointment.
> It didn't get into any technical detail, but it did have a few cameo
> appearances from AI researchers.  Terry Winograd and Pat Winston,
> for example, said that the problem of language understanding is hard.
>
> But I thought that Marvin Minsky and Doug Lenat said more with their
> tone of voice than with their words.  My interpretation (which could,
> of course, be wrong) is that both of them were seething with jealousy
> that IBM built a system that was competing with Jeopardy champions
> on national TV -- and without their help.
>
> In any case, the Watson project shows that terabytes of documents are
> far more important for commonsense reasoning than the millions of
> formal axioms in Cyc.  That does not mean that the Cyc ontology is
> useless, but it undermines the original assumptions for the Cyc
> project:  commonsense reasoning requires a huge knowledge base
> of hand-coded axioms together with a powerful inference engine.
>
> An important observation by Ferrucci:  The URIs of the Semantic Web
> are *not* useful for processing natural languages -- not for ordinary
> documents, not for scientific documents, and especially not for
> Jeopardy questions:
>
>    1. For scientific documents, words like 'H2O' are excellent URIs.
>       Adding an http address in front of them is pointless.
>
>    2. A word like 'water', which is sometimes a synonym for 'H2O',
>       has an open-ended number of senses and microsenses.
>
>    3. Even if every microsense could be precisely defined and
>       cataloged on the WWW, that wouldn't help determine which
>       one is appropriate for any particular context.
>
>    4. Any attempt to force human being(s) to specify or select
>       a precise sense cannot succeed unless *every* human
>       understands and consistently selects the correct sense
>       at *every* possible occasion.
>
>    5. Given that point #4 is impossible to enforce and dangerous
>       to assume, any software that uses URIs will have to verify
>       that the selected sense is appropriate to the context.
>
>    6. Therefore, URIs found "in the wild" on the WWW can never
>       be assumed to be correct unless they have been guaranteed
>       to be correct by a trusted source.
>
> These points taken together imply that annotations on documents
> can't be trusted unless (a) they have been generated by your
> own system or (b) they were generated by a system which is at
> least as trustworthy as your own and which has been verified
> to be 100% compatible with yours.
>
> In summary, the underlying assumptions for both Cyc and
> the Semantic Web need to be reconsidered.
>
> John
>
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
> To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>
>
>    (06)


-- 
Krzysztof Janowicz    (07)

GeoVISTA Center, Department of Geography, 302 Walker Building
Pennsylvania State University, University Park, PA 16802, USA    (08)

Email: jano@xxxxxxx
Webpage: http://www.personal.psu.edu/kuj13/
Semantic Web Journal: http://www.semantic-web-journal.net    (09)


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (010)

<Prev in Thread] Current Thread [Next in Thread>