Mike, Krzysztof, Pavithra, Doug, and Christopher, (01)
I'd like to make a few comments, starting with Mike's point: (02)
MB
> From what little I've seen it does rather seem to be a triumph
> of statistics over semantics. (03)
Statistics certainly plays a big role, but the relative role
depends on what you mean by 'semantics'. I would say that Watson
represents a triumph of 1980s style of NLP, and I'd summarize
the influences in one line: (04)
Michael McCord + Roger Schank + statistics + supercomputer (05)
For McCord's influence, I'll quote the following passage from the
article in AI Magazine: (06)
Page 11, http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf
> The DeepQA approach encourages a mixture of experts at this
> stage, and in the Watson system we produce shallow parses,
> deep parses (McCord 1990), logical forms, semantic role labels,
> coreference, relations, named entities, and so on, as well as
> specific kinds of analysis for question answering. (07)
In the 1980s, McCord had written an excellent parser in Prolog
with a good grammar of English. In the 1990s, he rewrote the
parser in C for better performance. Apparently, that's the
"deep" parser they use for Watson. (08)
Also in the 1980s, Roger Schank made the claim that axioms in
logic are irrelevant, but large volumes of background knowledge
are essential for language understanding. He also said that
machine learning is essential for NLP, since every text says
something new that must be added to the background knowledge.
Unfortunately, the technology at the time was too slow, and
the available resources of machine-readable texts were woefully
inadequate to support Schank's claims. (09)
Watson also uses ideas developed in the 1990s and later, but
one could say that the Watson-style of semantics is a high-speed
implementation of a Schankian style of NLP. The crucial technology
that Schank did not have is a supercomputer plus huge volumes of
preprocessed material indexed and accessible via a relational DB. (010)
I certainly won't downplay the importance of statistics, which
are essential for many aspects of Watson: evaluation of what
is relevant, estimating the confidence in an answer, and most
especially for techniques of machine learning. But learning
is also one of the aspects that Schank emphasized years ago. (011)
In short, Roger Schank's emphasis on informal methods of
processing and using large volumes of background knowledge,
case-based reasoning, and machine learning are much closer
to what Watson is doing than any logic-based method --
either Richard Montague's formal logic for NLP or Lenat's
enormous formal ontology for Cyc. (012)
Yet Watson does use some logic, however. It's just not the
main focus. Statistics and heuristics are more important. (013)
KJ
> As Watson turns out to be very successful this leads again to the
> question which role ontological categories play... (014)
Watson does use WordNet and other lexical resources. That is
important for selectional constraints on permissible combinations,
but Wordnet and similar resources have very few axioms. (015)
PK
> Too bad Watson could not do any Google search..
> and get better answers.. (016)
DF
> This was not the problem.
>
> Watson has Wikipedia in its memory. The answers to the specified
> question are in Wikipedia: Toronto being a Canadian city, Chicago
> being a US city, Chicago having O'Hare and Midway as airports,
> O'Hare being named after a WW II hero (flying ace), Midway being the
> name of a WW II battle. (017)
I strongly agree with Doug. Watson has predigested 15 terabytes
of background knowledge, which is tailored and indexed for its own
representations. Google's search methods are much less precise,
and they only return entire documents, which Watson would have
to spend too much time to read and analyze before it could answer
a jeopardy question. (018)
DF
> Question answering by a machine such as Watson is not very useful unless
> the system can explain its answers. Thus i am disappointed that IBM's
> "explanation" of its error in Final Jeopardy does not explain why it
> chose its answer. (019)
I agree. But the immediate task for the Jeopardy challenge did not
require explanations. Much more would have to be added, revised, and
extended before the Watson technology could be applied to other domains. (020)
From what I've read about Watson, I think that it could keep a backchain
of steps that lead to any conclusion. Keeping the entire derivation
tree of all rejected steps would be too voluminous. But Watson could
keep the supporting information for each step of the main thread. It
could also keep a summary of each rejected option leading away from
the main thread. (021)
Our VivoMind system, which also reads untagged English documents
in order to answer users' queries, maintains the backchain of
derivations, and it can display the sources from which any answer
was derived. See slides 32 to 40 of the following presentation: (022)
http://www.jfsowa.com/talks/pursue.pdf (023)
Slide 41 discusses how that technology could be adapted to
diagnosing cancer patients. (024)
CS
> One exchange [with Ken Jennings] that seems very relevant to
> this Forum is this one:
>
> Q: It was interesting to me that the "Which decade" category seemed
> especially hard for Watson. Why was that?
>
> A: I think it took it a little while to figure out that the answers
> would all be decades. This seems incredible to a human playing
> along at home, but basic contextual issues like this are incredibly
> hard for machine intelligence to master. The cool thing about
> Watson is it learns from those mistakes. By the end of the
> category, it had learned that the answers were going to be decades,
> and adjusted accordingly. (025)
Ferrucci explained the design decision for Watson to avoid using the
category names to evaluate answers because many Jeopardy categories
have misleading puns and wordplay. One example was the category named
"Church and State" for which the answer was "Christchurch, New Zealand."
Watson got that right, and the category name could have been confusing. (026)
However, some category names are very relevant -- "Which decade" and
"U.S. Cities", for example. The humans Ken and Brad correctly used
that information, but Watson made serious blunders by ignoring it. (027)
This is an example for which the Watson developers deliberately
ignored important information. They should have used machine
learning to let Watson analyze whether or not a particular kind
of category name might be more useful or more distracting. (028)
John (029)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (030)
|