John,
I think it's clear we both agree that understanding natural language will
require using a combination of various techniques, and your work is a good
example of that. We do seem to have a different approach to semantic
primitives. (01)
[JFS] > If you want a project that includes all word senses anyone considers
important, I suggest Wiktionary. It has "3,476,017 entries with English
definitions from over 500 languages":
I actually expect that will happen eventually (within the next 100 years),
but the immediate goal is a lot more modest. I want to be able to build a
machine that can understand and fluently talk to a 6-year old native speaker of
English. As you have noted, that is itself quite challenging, but requires a
vocabulary of only 5-10 thousand word senses. (02)
[JFS] > More precisely, the idea of selecting a small number of primitives for
defining everything is one of the oldest in the history of philosophy, logic,
linguistics, and AI.
Sure, and I derive a lot of inspiration form earlier efforts. But prior
to 1990, there was inadequate (or none) computer processing capability and a
poor understanding of ontological meaning representation. The work since then
is most relevant. I do appreciate all the reminders of earlier work, some of
which was remarkably precocious, given the crude technical tools. I
particularly liked Roget's work, 200 years ago. (03)
[JFS] . I never said "No amount of effort trying a related but different way
can succeed." In fact, I have been proposing and *using* related methods, but
I always insist on keeping all options open.
Here we can agree completely. I have never denigrated alternative
approaches, but I lament the current imbalance of statistical and analytical
efforts. (04)
[JFS] > Anna Wierzbicka spent many years working on issues of selecting and
> using a proposed set of primitives (05)
Yes, I enjoyed reading her work and found it remarkable how much can be
expressed with fewer than 100 primitives, but I have found many concepts that
require a lot more than that. The current COSMO has 7500 types and 750
relations, though many of those are not primitives, just elements convenient
for ease of understanding and processing. (06)
[JFS} > There is no evidence that a fixed set exists, and an overwhelming
amount of evidence that Zipf's Law holds: there is an extremely long tail to
the distribution of word senses. But if you keep your options open and *if* a
fixed set of primitives is sufficient, then you will discover that set. That
is my recommended strategy. (07)
Except for saying that "there is no evidence . . . " I wouldn't disagree with
any of that. The "evidence" I am developing is that I can identify a set of
primitives with which I can define almost anything - but the work is
incomplete, and there are probably still some primitives to identify. If you
think that there are any concepts that cannot be expressed by the elements in
the COSMO ontology, I would very much like to know which they are so that I can
see if new primitives are required. Surely, that is "evidence"??? Of
course, "proving" that a set of primitives is adequate for **broad**
interoperability is a much more challenging task than providing evidence. The
problem is that "proving" that a particular set of primitives is adequate for
general interoperability requires developing multiple non-trivial
ontology-based applications and showing that they can interoperate via the
primitives-based foundation ontology. The tail for new primitives may be
long, but for any *given* set of ontologies or applications there must be a
finite identifiable set of primitives. When new applications are developed,
then new primitives may (or may not) be needed. The big unknown is, is there
a limit or asymptote for the new primitives required? We may never know, but
at any given time we can use a given set of primitives adequate for all of the
(many many) applications of interest to us. My current interest is in
identifying a reasonable starting set of primitives that can be tested and
supplemented as required. (08)
[PC]
>> the current strong emphasis on the statistical approach is, I believe
>> retarding progress by failing to develop even the most basic resources
>> needed for the analytical stage 2 function.
>
> I wholeheartedly agree. But from a selfish point of view, that gives us a
>competitive advantage. We got a contract
> with the US Dept. of Energy based on a competition with a dozen groups that
>used their favorite methods of NLP. (09)
That is a great advantage when you can get the funding. Live long, work
hard, and succeed!! I look forward to new revelations. (010)
Pat (011)
Patrick Cassidy
MICRA Inc.
cassidy@xxxxxxxxx
1-908-561-3416 (012)
-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John F Sowa
Sent: Sunday, August 04, 2013 1:28 PM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] Context and Inter-annotator agreement (013)
Pat, (014)
PC
> The point at issue is whether all of the senses of a particular word
> needed for language understanding can be included in a semantic lexicon.
> My experience suggests that they can, even though new senses are being
> developed all the time. The new senses can also be included in the
> lexicon, if they are important enough to warrant the effort. (015)
That claim is vague enough to cover all bases. If you want a project that
includes all word senses anyone considers important, I suggest Wiktionary. It
has "3,476,017 entries with English definitions from over 500 languages": (016)
http://en.wiktionary.org/wiki/Wiktionary:Main_Page (017)
Large numbers of people around are actively updating and extending Wiktionary.
When the number of senses is in the millions and growing, it seems hard to
claim that there is any finite upper limit. (018)
PC
> JFS seems to be saying that failure of some groups to achieve a goal
> means that no amount of effort trying a related but different way can
> succeed (019)
More precisely, the idea of selecting a small number of primitives for defining
everything is one of the oldest in the history of philosophy, logic,
linguistics, and AI. It can be traced back at least to 500 BC with Pythagoras,
Plato, and Aristotle. For summaries and references, see
http://www.jfsowa.com/talks/kdptut.pdf . (020)
Slides 13 to 18: Aristotle's categories, definitions, and the Tree
of Porphyry for organizing them graphically. (021)
Slides 91 to 93: Universal language schemes in the 17th and 18th
centuries. John Wilkins developed the largest and most impressive
set of primitives (40 genera subdivided in 2030 species). Wilkins
got help from other members to define 15,000 words in those terms.
For more information about these and other schemes, see references
by Knowlson (1975), Eco (1995), and Okrent (2009). (022)
Slides 94 to 97: Ramon Llull's Great Art (Ars Magna), which included
Aristotle's categories, the Tree of Porphyry, rotating circles
for combining categories, and a methodology for using them to
answer questions. Leibniz was inspired by Llull to encode the
primitive categories in prime numbers and use multiplication
to combine them and division to analyze them. (023)
Slide 98: Leibniz's method generated a lattice. For modern
lattice methods, see FCA and Ranganathan's facet classification.
Click on the URLs to see FCA lattices that are automatically
derived from WordNet and from Roget's Thesaurus. (024)
Slides 99 to 101: Categories by Kant and Peirce. A suggested
updated version of Wilkins' hierarchy that includes more
modern developments. (025)
Slides 102 to 107: Issues about the possibility of ever having
a complete, consistent, and finished ontology of everything. (026)
For modern computational linguistics, the idea of selecting a set of primitives
for defining everything was proposed and implemented in the late 1950s and
early '60s: (027)
1961 International Conf. on Machine Translation. See the table
of contents: http://www.mt-archive.info/NPL-1961-TOC.htm .
At that conference, Margaret Masterman proposed a list of 100
primitive concepts, which she used as the basis for lattices
that combine them in all possible ways. Yorick Wilks worked
with Masterman and others at CLRU, and he continued to use
her list of primitives for his later work in NLP. For the
list, see http://www.mt-archive.info/NPL-1961-Masterman.pdf (028)
TINLAP (three conferences on Theoretical Issues in Natural Language
Processing from 1975 to 1987). The question of primitives was
the focus of these conferences. Yorick Wilks was one of the
organizers. Roger Schank (who also had a set of primitives for
defining action verbs) was prominent in them. For summaries,
see http://www.aclweb.org/anthology-new/T/T78/T78-1000.pdf
and http://www.aclweb.org/anthology-new/T/T87/T87-1001.pdf . (029)
Anna Wierzbicka spent many years working on issues of selecting and
using a proposed set of primitives for defining words in multiple
languages. From Wikipedia: "She is especially known for Natural
Semantic Metalanguage, particularly the concept of semantic primes.
This is a research agenda resembling Leibniz's original "alphabet
of human thought", which Wierzbicka credits her colleague, linguist
Andrzej Bogusławski, with reviving in the late 1960s." Many people
tried to use her "semantic primes" in computational linguistics,
but none of those projects were successful. (030)
I never said "No amount of effort trying a related but different way can
succeed." In fact, I have been proposing and *using* related methods, but I
always insist on keeping all options open. (031)
There is no evidence that a fixed set exists, and an overwhelming amount of
evidence that Zipf's Law holds: there is an extremely long tail to the
distribution of word senses. But if you keep your options open and *if* a
fixed set of primitives is sufficient, then you will discover that set. That
is my recommended strategy. (032)
> So the statistical approach has become vastly more funded than the
> ontological/analytical. (033)
I certainly agree with you that a deeper analysis with ontologies and related
lexical resources is essential for NL understanding. I believe that
statistical methods are useful as a *supplement* to the deeper
methods. At VivoMind Research, we use *both*, but the emphasis is
on a syntactic and semantic analysis by symbolic methods. (034)
> the current strong emphasis on the statistical approach is, I believe
> retarding progress by failing to develop even the most basic resources
> needed for the analytical stage 2 function. (035)
I wholeheartedly agree. But from a selfish point of view, that gives us a
competitive advantage. We got a contract with the US Dept. of Energy based on
a competition with a dozen groups that used their favorite methods of NLP. (036)
For the test, all competitors were asked to extract certain kinds of data from
a set of research reports and present the results in a table.
The scores were determined by the number of correct answers. Our score was
96%. The next best was 73%. Third best was above 50%, and all the rest were
below 50%. (037)
For analyzing the documents, we used very general lexical resources and a
fairly simple general ontology. But we supplemented it with a detailed
ontology that was specialized for chemical compounds, chemical formulas, and
the related details of interest. (038)
For an example of a spreadsheet with the results, see slides 49 & 50 of
http://www.jfsowa.com/talks/relating.pdf . (039)
John (040)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/ Community Wiki:
http://ontolog.cim3.net/wiki/ To join:
http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (041)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (042)
|