A few comments in reply to some remarks of John Sowa: (01)
IN this and a number of other notes, JFS seems to be saying that failure of
some groups to achieve a goal means that no amount of effort trying a related
but different way can succeed: (02)
> Also look at the Japanese EDR project (use Google for ref's). The Japanese
>gov't poured billions of yen into developing
> a "concept dictionary" with 400,000 concepts with mappings to both English
>and Japanese. CSLI at Stanford had a
> copy of it, and I asked people there whether anybody was using it.
> The answer I got is that nobody had found anything useful to do with it.
> Use your favorite search engine to look for references to the SENSEVAL
>projects. If you want more info, subscribe
> to Corpora List and ask people there what they think about these issues.
>Adam Kilgarriff, by the way, was
> one of the organizers of the SENSEVAL projects.
> Researchers on machine translation tried to develop an Interlingua of
>concepts (word senses) that would be useful
> for MT. They failed in the same way as EDR: none of them produced useful
>results that justified
> a continuation of the R & D funding.
> If you want to make any claims about the value of a large inventory of
>logic-based concepts, you have to explain
> how your method would differ from Cyc. If you want something freely
>available, explain what you would do
> that is different from OpenCyc.
> By the way, I spoke with Ron Kaplan from PARC, then PowerSet, and later
>Microsoft. They had a license
> to use *all* of Cyc including all the logic and the tools. But Ron said
>that they just used the hierarchy.
> They did *not* use the axioms. In any case, most of the PARC/PowerSet
> left Microsoft -- that includes Ron K. That's not a point in its favor. (03)
I am aware of all of those projects, and have spoken directly to some of the
principles, including Doug Lenat and Ron K, and can say that none of those
efforts have even tried the approach I suggested, for various reasons. Mostly,
none of those groups had the time or inclination to pursue the long-term goal
of human language understanding even at the most elementary level, because
their funded goals required that they directly pursue more immediately
practical results for processing texts (or in the case of Cyc, databases) in a
manner that would not itself provide human-level understanding, but could
present a "good enough" set of possible interpretations that a human would then
evaluate. For NL, the statistical approach made possible by massive amounts of
text, including some annotated text, proved to be at least as good or better
than the more time-consuming syntactic and conceptual paths, by the measures
used. So the statistical approach has become vastly more funded than the
ontological/analytical. There is no way to know whether the same amount of
effort devoted to an ontological approach that is coordinated with development
of a primitives-based foundation ontology would or would not have succeeded
better or faster. I think it would have. But in any case it is incorrect to
say that the approach I have suggested to developing a primitives-based
foundation ontology for NLP has already been tried. It hasn't. (04)
This is not to say that a statistical approach is misguided. On the contrary,
I think it closely mimics the early stages of language understanding in humans,
but fails at the secondary analytical stage which determines whether the "most
probable" interpretations make sense in the context of the communication.
Those who have read "Thinking, Fast and Slow" might consider the statistical
versus the syntactic, logical (ontological) methods of processing language as
being analogous to "system 1" and "system 2" respectively in Kahneman's
problem-solving domain. As Kahneman found, the statistical approach is fast
(particularly suited to the parallel connective nature of the brain), but
error-prone, needing supplementation from a slower more logical "system" in the
brain to get usable results when accuracy is important. So, research on a fast
statistical approach is entirely appropriate, but the current strong emphasis
on the statistical approach is, I believe retarding progress by failing to
develop even the most basic resources needed for the analytical stage 2
Among the prominent American NL researchers I think that Hovy was most
determined to try the analytical approach, but eventually (not sure of all the
details) wound up using mostly WordNet as the primary lexical resource (better
coverage, I think, for the broad texts he had to analyze). When it became
obvious that Wordnet was not satisfactory, he reorganized it by aggregating
some senses, which necessarily improved the inter annotator agreement by
reducing the number of senses that had to be considered. But that left many of
the problems in WordNet unaddressed. (06)
My humble hypothesis is that a resource better than WordNet for NLP can and
should be developed, and he time-consuming task of annotating enough text to
allow it to work with the statistical approach should go together with that
project.. Exactly how to achieve that can be debated, but the big problem is
that no such project other than Hovy's aggregating project "OntoNotes" (which
reduces discrimination of meanings without making them more accurate) seems to
be in progress. More progress may yet occur in NLP by just piling on more
statistical data, but that will likely hit an asymptote that can only be solved
by getting a better semantic dictionary. That is why I am focusing on that
particular project. Maybe my specific approach is not optimal, but some effort
in that direction is, IMHO, vastly better than none. (07)
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John F Sowa
Sent: Saturday, August 03, 2013 10:54 AM
Subject: Re: [ontolog-forum] Context and Inter-annotator agreement (010)
Pat, Michael, Ed, and William, (011)
I strongly believe in the need for good lexical resources.
For examples of the resources VivoMind uses, see the web site that Arun
Majumdar maintains and the references cited there: (012)
I also believe that no single resource can be necessary, sufficient, or even
adequate for the task of language understanding. The title of the following
paper is "Two paradigms are better than one and multiple paradigms are even
For a brief summary of the insufficient attempts over the past few millennia,
see the list (Slide 10) at the end of this note. (016)
>>> Those meanings that can be reliably distinguished (>98%) by
>>> motivated (rewarded for accuracy) human annotators. (017)
>> There are no such meanings -- except in very special cases. (018)
> I think that human performance with real informative text is typically
> above that level, when one is trying to be accurate and not sloppy or hurried. (019)
Yes. But there is a *huge* difference between using language precisely for
human to human communication and the artificial exercise of trying to select a
word sense from some list. (020)
> one has to start with some inventory of senses, but the most detailed
> inventory yet used for such tests by NL researchers is WordNet, which
> is not a good standard for such testing. (021)
No. An inventory of word senses is neither necessary nor sufficient for NLP.
WordNet is widely used because it's free, but see Slide 10. (022)
Also look at the Japanese EDR project (use Google for ref's). The Japanese
gov't poured billions of yen into developing a "concept dictionary" with
400,000 concepts with mappings to both English and Japanese. CSLI at Stanford
had a copy of it, and I asked people there whether anybody was using it. The
answer I got is that nobody had found anything useful to do with it. (023)
Use your favorite search engine to look for references to the SENSEVAL
projects. If you want more info, subscribe to Corpora List and ask people
there what they think about these issues. Adam Kilgarriff, by the way, was one
of the organizers of the SENSEVAL projects. (024)
>> Unfortunately, there is no finite "set of senses" that can be used to
>> achieve "human-level interpretation of a broad range of texts." (025)
> That is a bold claim... my observations suggest that no remotely
> applicable test has yet been conducted to see if such a claim is even
Researchers on machine translation tried to develop an Interlingua of concepts
(word senses) that would be useful for MT. They failed in the same way as EDR:
none of them produced useful results that justified a continuation of the R &
D funding. (027)
> Until we develop a logic-based word sense inventory intended for broad
> use I don't see how the maximum agreement could be tested. (028)
If you want to make any claims about the value of a large inventory of
logic-based concepts, you have to explain how your method would differ from
Cyc. If you want something freely available, explain what you would do that is
different from OpenCyc. (029)
By the way, I spoke with Ron Kaplan from PARC, then PowerSet, and later
Microsoft. They had a license to use *all* of Cyc including all the logic and
the tools. But Ron said that they just used the hierarchy. They did *not* use
the axioms. In any case, most of the PARC/PowerSet people have left Microsoft
-- that includes Ron K.
That's not a point in its favor. (030)
> Fundamental principle: People think in *words*, not in *word senses*. (031)
> Really? I sure don’t. Without the textual content to disambiguate
> words, communication would be extremely error-prone.
> Where does that notion come from? (032)
The technical term 'disambiguate' is used by some linguists to describe a stage
in some programs that attempt to understand language. (033)
At age 3, Laura understood and generated language far better than any of those
programs. She didn't "disambiguate" words, she just
*used* words in meaningful patterns. See the article by John Limber cited in
slide 8 of http://www.jfsowa.com/talks/goal.pdf . (034)
> In order to understand a sentence, it is not enough to pick the right
> word sense. You have to understand the sense and how it can modify other
> You need context, background knowledge and the ability for abstraction
> and generalization. You need to be able to follow the line of thinking
> of the author. (035)
That's fairly close to what the VivoMind software does. It does
*not* have a stage that could be called "word sense disambiguation".
Instead, the system uses a large inventory of graph patterns and a kind of
associative memory for matching its inventory to the patterns that occur in the
text. See the paradigm.pdf article. (036)
Those patterns can come from multiple sources. The words are organized in a
hierarchy of types and subtypes, but they are
*not* labeled with a fixed set of word senses. Instead, new patterns are
generated dynamically from various sources, and they are added to the inventory
for future use. (037)
Interesting point: If the VivoMind software is used to reread the same
document on a second pass, it generates a different and usually better
interpretation (an interconnected graph of everything derived from the
document). That's closer to what people do. (038)
> Or take my own sentence:
>> When you put words together, you often create completely new senses
>> that cannot be grasped by looking at individual word senses only.
> How do you get from "put" and "together" to "put together" ? There are
> many senses for "looking" in Wordnet but I cannot find the right one.
> It is not used literally here. (039)
That's a good example. By using the original words from the text as labels on
the graphs, the VivoMind software can use patterns from the literal use of the
word 'look' to interpret metaphorical uses. (040)
> For example, if I say "Michael eats new technologies for breakfast",
> you could understand this even if you have never heard this metaphor
> before. The first time you heard such a thing you would have no doubt
> as to its meaning. This is not poetry, but is its kin, as is all speech. (041)
That is what the VivoMind software would do with that sentence. It would use
the common pattern for Eat to interpret it, but it would note that technologies
are not a common kind of food. If it couldn't find a better pattern in its
inventory, it would use the one it found.
But it would also evaluate the pattern match as less than perfect. (042)
> meanings are often gradients that are noticed only when the distance
> between two points is great enough, creating what some call a 'different'
> sense, and can be blended in a variety of ways. (043)
Yes. A semantic distance measure is essential for evaluating the pattern
matching that occurs during language processing. The question of what distance
is "great enough" is highly idiosyncratic. (044)
> It is this that makes language actually a game that people play.
> Communicating with people means getting the hang of this game. (045)
That was the main point of Wittgenstein's later philosophy. I believe it's
fundamental to understanding language of any kind -- natural or artificial, by
humans or by computers. (046)
> The purpose of natural language dictionaries is to inform humans who
> presumably have some familiarity with the language. So, primitive
> terms are "defined" by providing synonyms and circular
> circumlocutions. The idea is that the human reader will recognize
> enough of that verbiage to grasp the intended concept, by being
> familiar with the concept itself, presumably in other terms. (047)
I agree. And I would note that the so-called "word senses" represent some
lexicographers' choices in grouping the citations from which a dictionary is
derived. That organization is helpful for some purposes, but there is no
evidence for a fixed, universal set. (048)
From Slide 10 of http://www.jfsowa.com/talks/kdptut.pdf (050)
PROSPECTS FOR A UNIVERSAL ONTOLOGY (051)
Many projects, many useful theories, but no consensus: (052)
● 4th century BC: Aristotle’s categories and syllogisms.
● 12th to 16th c AD: Scholastic logic, ontology, and semiotics.
● 17th c: Universal language schemes by Descartes, Mersenne, Pascal, Leibniz,
Newton, Wilkins. L’Académie française.
● 18th c: More schemes. Satire of the Grand Academy of Lagado by Jonathan
Swift. Kant’s categories.
● 19th c: Ontology by Hegel, Bolzano. Roget’s Thesaurus. Boolean algebra.
Modern science, philosophy of science, early computers.
● Late 19th and early 20th c: FOL. Set theory. Ontology by Peirce, Brentano,
Meinong, Husserl, Leśniewski, Russell, Whitehead.
● 1970s: Databases, knowledge bases, and terminologies.
● 1980s: Cyc, WordNet, Japanese Electronic Dictionary Research.
● 1990s: Many research projects. Shared Reusable Knowledge Base (SRKB), ISO
Conceptual Schema, Semantic Web.
● 21st c: Many useful terminologies, but no universal ontology. (053)
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/ Community Wiki:
http://ontolog.cim3.net/wiki/ To join:
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (055)