It’s a pleasure to hear from Sergei on this list.
His group is one of the few left in the US that is using an ontology for NLP.
I find no serious disagreement with any of his points, but would
like to clarify what may be some misunderstandings.
[[]] [SN] > First, our main problem is well
beyond choosing the
> "right" knowledge representation schema: it
is all about
> the content of knowledge, not the format.
Yes, yes. The question of finding a common foundation
ontology that is structured as a Conceptual Defining Vocabulary is precisely to
allow representation of content using a common specification of meaning, so
that the meanings of information created by separate groups can be
automatically translated and interpreted by any system using the common
foundation ontology. This *is* an issue of content. Whether different
formats can be precisely translated depends on levels of expressiveness
(assuming logical consistency of the content).
[[]] [SN] > So, alas, we should set our goals in a
way that is somehow
> commensurate with realistic expectations.
Yes. That is why my immediate focus is on one small bite
of that ten-foot submarine; to try to determine if the Conceptual Defining
Vocabulary (foundation ontology) will expand slowly enough as new domains are
added that it will be sufficiently stable to serve as a standard of meaning.
[[]][SN] > This is one of the reasons why I think
> standards could really be enforced in this area and
> that it may be a noble but doomed task to try to
> come up with a single common syntax and semantics for
> the metalanguage for specifying knowledge about
> language and the world (whether these are different
> metalanguages or a single one).
Probably. But we do not need universal acceptance
to build a useful foundation ontology. It only has to be used by a
sufficiently large community of research groups to provide the common standard
of meaning that facilitates reuse of research results among many groups,
thereby serving as a common paradigm within which incremental improvements in
application components can cumulatively develop into powerful systems. At
present, improvements tend to occur only within individual or small clusters of
research teams, and reuse between them is very inefficient. There are
some parts of the problem (e.g. reasoning methods) where there is a high level
of reuse, but much of NLP tends to evolve as local systems. To be sure,
some of them have grown impressive, but still have a long way to go to look
like human performance. I think that a common foundation ontology would
be a powerful tool for NLU research, however parsimoniously it is funded.
All suggestions for changes or additions to the COSMO are
welcome, including pointers to a full dump of any foundation ontology that anyone
thinks is useful.
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Sergei
Sent: Monday, March 31, 2008 11:09 AM
Subject: Re: [ontolog-forum] What is "understanding" - was:
Building on common ground
I am very happy to see a public discussion of issues that
I have been studying for many years. It feels, in part,
that we are back in 1982. Of course, we have learned
a lot since then, as a field.
I think that the core of the new understanding can
reduced to two high-level points.
First, our main problem is well beyond choosing the
"right" knowledge representation schema: it is all
the content of knowledge, not the format.
Second, the scope of work is so much broader
the most sober among us had expected.
A few years ago I compared the outlays for the
Manhattan project, the Human Genome project
and NLP. The statistics were not complete, I am
sure, but the trend was clear: our work has been
funded at a small fraction of those projects.
It is societally understandable, of course. But the
bad news is that I think our problem is more
complex than either of those problems...
So, alas, we should set our goals in a way that is somehow
commensurate with realistic expectations. The bad news
here is that there will be no instant gratification on a
grand scale. Not for our generation.
As to knowledge acquisition work, it is thankless:
while it can be intellectually quite demanding,
one can't defend a dissertation in it, so students
naturally prefer either working on formalisms or
on theorem provers or on statistics-oriented
This is one of the reasons why I think that no
standards could really be enforced in this area and
that it may be a noble but doomed task to try to
come up with a single common syntax and semantics for
the metalanguage for specifying knowledge about
language and the world (whether these are different
metalanguages or a single one).
I'll also make some comments inline.
On Mar 30, 2008, at 11:26 PM, John F. Sowa wrote:
I'll accept parts of what both of you are saying, but with
Before getting to the qualifications,
I'll quote an example I've used before.
Following are four sentences that use the same verb in
a similar syntactic pattern, but with very different,
highly domain-dependent senses:
supported the tomato plant with a stick.
supported his daughter with $20,000 per year.
supported his father with a decisive argument.
supported his partner with a bid of 3 spades.
Making those choices requires quite a bit of background
and it's definitely nontrivial with current technology. But the
next question is what to do with that choice. It might be useful
in machine translation for picking the correct verb in some
a statistical translator with enough data could
do so. But
could that be called "understanding"?
Well, there's no problem with calling this a modicum of
not complete understanding...
Suppose we had a word-expert analyzer with an enormous
of information about each verb. Would that be the best way to
organize the knowledge base? Would you put some knowledge about
bridge or tomatoes into some rules for each verb, noun, and
adjective that might refer to bridge or to tomatoes? Or would
it be better to put all the knowledge about bridge in a
that deals with bridge and all the knowledge about tomatoes
a module that deals with tomatoes?
Any which way. Let it even be inefficient. But we need
cross-indexed descriptions of complex events with their
and participants, pre- and post-conditions and other
We are building such entities for a few projects we are
on, and, of course, it is a slow and painful task with lots
With either way of organizing the knowledge -- by words or
by subject matter -- how would you relate the lexical info
about each word to the ontology and to the background
knowledge about how to play bridge or work in the garden?
The organization in our approach is by (ontological)
elements of world knowledge
but our lexicon expresses lexical meaning in terms of the
ontological metalanguage (there are exceptions, but talking
about them is well beyond the grain size of this message).
the example above, there will be in the ontology the
describing what happens when people play bridge, and there
will be indications in the lexicon of any idiosyncratic word
and phrase senses relating to bridge playing. Many meanings
will still be derived in a compositional way, with the
of the complex event of playing bridge serving as a (core)
heuristic for making preferences during ambiguity
BTW, there are many more kinds of ambiguity to deal with
in addition to word sense or PP attachment (to name a couple
that have been in the center of the field's attention):
ambiguities, referential ambiguities, semantic
ambiguity, non-literal language-related ambiguity, etc.
If you intend to use logic, how much logic would be needed
for those sentences?
What would a theorem prover do to aid
proving some theorem about tomatoes be
A theorem prover can be adapted to drive
the ambiguity resolution process.
However, the main issue is that we need to be able to
make successful inferences against a knowledge base that is
not sound and incomplete. That's reality. So, if logic can
come up with methods that support such a task, great.
Otherwise, we scruffies will have to make do with whatever
Is it likely that a bunch of people (similar to the
would be willing and able to enter the kinds of knowledge in
kinds of formats necessary for a system that understands?
I think that hoping that something like this will be done by
enthusiasts is, err, premature. As the saying goes, you get
what you pay for (and this is not - entirely - a cynical
view :-) ).
Realistically, though, the field doesn't have the
funding for this kind of effort.
The Cyc project has been paying professional knowledge
to enter such knowledge into their system for the past 22
They had two million axioms in 2004, but Cyc still can't
a book in order to build up its knowledge base. How much more
would be needed? Or
is there some better way? What
As far as I know, many of the Cyc axioms are actually facts
(e.g., knowledge about
Austin, TX), not concepts (knowledge about cities in
general). Also, it
is instructive that they seem to be using the knowledge base
only for statistical NLP (my information may be wrong here,
As for reading a book, we have started a project that uses
current, limited, ontology/lexicon/grammar/preprocessing
to extract knowledge of unknown concepts/words from the web.
But it's just scraping the surface, however exciting the
project may be...
There is much. much more that can be said, of course. Would
nice to talk about this, not e-mail...
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (01)