On Mar 30, 2008, at 11:26 PM, John F. Sowa wrote:
Pat C. and John B.,
I'll accept parts of what both of you are saying, but with
many qualifications. Before getting to the qualifications,
I'll quote an example I've used before.
Following are four sentences that use the same verb in
a similar syntactic pattern, but with very different,
highly domain-dependent senses:
1. Tom supported the tomato plant with a stick.
2. Tom supported his daughter with $20,000 per year.
3. Tom supported his father with a decisive argument.
4. Tom supported his partner with a bid of 3 spades.
Making those choices requires quite a bit of background knowledge,
and it's definitely nontrivial with current technology. But the
next question is what to do with that choice. It might be useful
in machine translation for picking the correct verb in some target
language. Perhaps a statistical translator with enough data could
do so. But could that be called "understanding"?
Well, there's no problem with calling this a modicum of understanding,
not complete understanding...
Suppose we had a word-expert analyzer with an enormous amount
of information about each verb. Would that be the best way to
organize the knowledge base? Would you put some knowledge about
bridge or tomatoes into some rules for each verb, noun, and
adjective that might refer to bridge or to tomatoes? Or would
it be better to put all the knowledge about bridge in a module
that deals with bridge and all the knowledge about tomatoes in
a module that deals with tomatoes?
Any which way. Let it even be inefficient. But we need multiply
cross-indexed descriptions of complex events with their subevents
and participants, pre- and post-conditions and other properties.
We are building such entities for a few projects we are working
on, and, of course, it is a slow and painful task with lots of corrections
With either way of organizing the knowledge -- by words or
by subject matter -- how would you relate the lexical info
about each word to the ontology and to the background
knowledge about how to play bridge or work in the garden?
The organization in our approach is by (ontological) elements of world knowledge
but our lexicon expresses lexical meaning in terms of the
ontological metalanguage (there are exceptions, but talking
about them is well beyond the grain size of this message). So, in
the example above, there will be in the ontology the event
describing what happens when people play bridge, and there
will be indications in the lexicon of any idiosyncratic word
and phrase senses relating to bridge playing. Many meanings
will still be derived in a compositional way, with the knowledge
of the complex event of playing bridge serving as a (core)
heuristic for making preferences during ambiguity resolution.
BTW, there are many more kinds of ambiguity to deal with
in addition to word sense or PP attachment (to name a couple
that have been in the center of the field's attention): scope
ambiguities, referential ambiguities, semantic dependency
ambiguity, non-literal language-related ambiguity, etc.
If you intend to use logic, how much logic would be needed
for those sentences? What would a theorem prover do to aid
understanding? Could proving some theorem about tomatoes be
A theorem prover can be adapted to drive
the ambiguity resolution process.
However, the main issue is that we need to be able to
make successful inferences against a knowledge base that is
not sound and incomplete. That's reality. So, if logic can
come up with methods that support such a task, great.
Otherwise, we scruffies will have to make do with whatever
we can muster.
Is it likely that a bunch of people (similar to the Wikipedians)
would be willing and able to enter the kinds of knowledge in the
kinds of formats necessary for a system that understands?
I think that hoping that something like this will be done by
enthusiasts is, err, premature. As the saying goes, you get
what you pay for (and this is not - entirely - a cynical view :-) ).
Realistically, though, the field doesn't have the funding for this kind of effort.
The Cyc project has been paying professional knowledge engineers
to enter such knowledge into their system for the past 22 years.
They had two million axioms in 2004, but Cyc still can't read
a book in order to build up its knowledge base. How much more
would be needed? Or is there some better way? What way?
As far as I know, many of the Cyc axioms are actually facts (e.g., knowledge about
Austin, TX), not concepts (knowledge about cities in general). Also, it
is instructive that they seem to be using the knowledge base
only for statistical NLP (my information may be wrong here, though!).
As for reading a book, we have started a project that uses our
current, limited, ontology/lexicon/grammar/preprocessing resources
to extract knowledge of unknown concepts/words from the web.
But it's just scraping the surface, however exciting the project may be...
There is much. much more that can be said, of course. Would be
nice to talk about this, not e-mail...