On 2/10/11 11:27 AM, John F. Sowa wrote:
> Thanks for the reminder:
>> Dave Ferrucci gave a talk on UIMA (the Unstructured Information
>> Management Architecture) back in May-2006, entitled: "Putting the
>> Semantics in the Semantic Web: An overview of UIMA and its role in
>> Accelerating the Semantic Revolution"
> I recommend that readers compare Ferrucci's talk about UIMA in
> 2006 with his talk about the Watson system and Jeopardy in 2011.
> In less than 5 years, they built Watson on the UIMA foundation,
> which contained a reasonable amount of NLP tools, a modest ontology,
> and some useful tools for knowledge acquisition. During that time,
> they added quite a bit of machine learning, reasoning, statistics,
> and heuristics. But most of all, they added terabytes of documents.
> For the record, following are Ferrucci's slides from 2006:
> Following is the talk that explains the slides:
> And following is his recent talk about the DeepQA project for
> building and extending that foundation for Jeopardy:
> Compared to Ferrucci's talks, the PBS Nova program was a disappointment.
> It didn't get into any technical detail, but it did have a few cameo
> appearances from AI researchers. Terry Winograd and Pat Winston,
> for example, said that the problem of language understanding is hard.
> But I thought that Marvin Minsky and Doug Lenat said more with their
> tone of voice than with their words. My interpretation (which could,
> of course, be wrong) is that both of them were seething with jealousy
> that IBM built a system that was competing with Jeopardy champions
> on national TV -- and without their help.
> In any case, the Watson project shows that terabytes of documents are
> far more important for commonsense reasoning than the millions of
> formal axioms in Cyc. That does not mean that the Cyc ontology is
> useless, but it undermines the original assumptions for the Cyc
> project: commonsense reasoning requires a huge knowledge base
> of hand-coded axioms together with a powerful inference engine.
> An important observation by Ferrucci: The URIs of the Semantic Web
> are *not* useful for processing natural languages -- not for ordinary
> documents, not for scientific documents, and especially not for
> Jeopardy questions:
> 1. For scientific documents, words like 'H2O' are excellent URIs.
> Adding an http address in front of them is pointless.
> 2. A word like 'water', which is sometimes a synonym for 'H2O',
> has an open-ended number of senses and microsenses.
> 3. Even if every microsense could be precisely defined and
> cataloged on the WWW, that wouldn't help determine which
> one is appropriate for any particular context.
> 4. Any attempt to force human being(s) to specify or select
> a precise sense cannot succeed unless *every* human
> understands and consistently selects the correct sense
> at *every* possible occasion.
> 5. Given that point #4 is impossible to enforce and dangerous
> to assume, any software that uses URIs will have to verify
> that the selected sense is appropriate to the context.
> 6. Therefore, URIs found "in the wild" on the WWW can never
> be assumed to be correct unless they have been guaranteed
> to be correct by a trusted source.
> These points taken together imply that annotations on documents
> can't be trusted unless (a) they have been generated by your
> own system or (b) they were generated by a system which is at
> least as trustworthy as your own and which has been verified
> to be 100% compatible with yours.
> In summary, the underlying assumptions for both Cyc and
> the Semantic Web need to be reconsidered.
> John (01)
#URIs (as you know) are simply identifiers that may or may not resolve.
In the context of Linked Data they deliver "Super Keys". Once loaded
into a DBMS, they are just like any other Super Keys re. internal data
I can tell you this for sure, now that Chris Welty from the Watson
Project team confirmed to me that IBM is okay with me talking about the
DBpedia and LOD aspects: (04)
DBpedia, and many datasets from the LOD cloud provided great sources of
facts to Watson. (05)
Watson learns by consuming a broad array of data sources en route to
building sophisticated graphs that drive its fundamental essence. (06)
Watson is a demonstration of what can be achieved when you combine NLP
and Deductive DBMS technology -- where Linked Data and lots of other
Semantic Web Project outputs contribute. (07)
Like you, I am no fan of the provincial aspects of "The Semantic Web"
meme. That said, I do believe these kinks are being ironed so expect
more inclusiveness and acceptance of constructive criticisms and
feedback. That said, Watson demonstrates that we are getting closer to
the desired destination of a much smarter computing space driven by
combining the best of NLP, Structured Linked Data, and DBMS technology. (08)
As I said earlier, DBpedia, and other LOD cloud datasets really helped
Watson up the ante. All things being equal (modulo Murphy basically), I
am expecting Watson to win next week's Jeopardy competition :-) (09)
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
> To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
Twitter/Identi.ca: kidehen (013)
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (014)