ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Context and Inter-annotator agreement

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Patrick Cassidy" <pat@xxxxxxxxx>
Date: Sun, 4 Aug 2013 21:17:12 -0400
Message-id: <19d601ce9179$840910a0$8c1b31e0$@micra.com>
John,
   I think it's clear we both agree that understanding natural language will 
require using a combination of various techniques, and your work is a good 
example of that.   We do seem to have a different approach to semantic 
primitives.    (01)

[JFS] >  If you want a project that includes all word senses anyone considers 
important, I suggest Wiktionary.  It has "3,476,017 entries with English 
definitions from over 500 languages":
     I actually expect that will happen eventually (within the next 100 years), 
but the immediate goal is a lot more modest.  I want to be able to build a 
machine that can understand and fluently talk to a 6-year old native speaker of 
English.  As you have noted, that is itself quite challenging, but requires a 
vocabulary of only 5-10 thousand word senses.    (02)

[JFS] > More precisely, the idea of selecting a small number of primitives for 
defining everything is one of the oldest in the history of philosophy, logic, 
linguistics, and AI.
     Sure, and I derive a lot of inspiration form earlier efforts.   But prior 
to 1990, there was inadequate (or none) computer processing capability and a 
poor understanding of ontological meaning representation.  The work since then 
is most relevant.  I do appreciate all the reminders of earlier work, some of 
which was remarkably precocious, given the crude technical tools.   I 
particularly liked Roget's work, 200 years ago.    (03)

[JFS] . I never said "No amount of effort trying a related but different way 
can succeed."  In fact, I have been proposing and *using* related methods, but 
I always insist on keeping all options open.
   Here we can agree completely.  I have never denigrated alternative 
approaches, but I lament the current imbalance of statistical and analytical 
efforts.    (04)

[JFS] > Anna Wierzbicka spent many years working on issues of selecting and
 >     using a proposed set of primitives    (05)

    Yes, I enjoyed reading her work and found it remarkable how much can be 
expressed with  fewer than 100 primitives, but I have found many concepts that 
require a lot more than that.  The current COSMO has 7500 types and 750 
relations, though many of those are not primitives, just elements convenient 
for ease of understanding and processing.    (06)

[JFS} > There is no evidence that a fixed set exists, and an overwhelming 
amount of evidence that Zipf's Law holds:  there is an extremely long tail to 
the distribution of word senses.  But if you keep your options open and *if* a 
fixed set of primitives is sufficient, then you will discover that set.  That 
is my recommended strategy.    (07)

Except for saying that "there is no evidence . . . " I wouldn't disagree with 
any of that.   The "evidence" I am developing is that I can identify a set of 
primitives with which I can define almost anything - but the work is 
incomplete, and there are probably still some primitives to identify.  If you 
think that there are any concepts that cannot be expressed by the elements in 
the COSMO ontology, I would very much like to know which they are so that I can 
see if new primitives are required.   Surely, that is "evidence"???    Of 
course, "proving" that a set of primitives is adequate for **broad** 
interoperability is a much more challenging task than providing evidence.   The 
problem is that "proving" that a particular set of primitives is adequate for 
general interoperability requires developing multiple non-trivial 
ontology-based applications and showing that they can interoperate via the 
primitives-based foundation ontology.   The tail for new primitives may be 
long, but for any *given* set of ontologies or applications there must be a 
finite identifiable set of primitives.  When new applications are developed, 
then new primitives may (or may not)  be needed.  The big unknown is, is there 
a limit or asymptote for the new primitives required?  We may never know, but 
at any given time we can use a given set of primitives adequate for all of the 
(many many) applications of interest to us.   My current interest is in 
identifying a reasonable starting set of primitives that can be tested and 
supplemented as required.    (08)

[PC]
>> the current strong emphasis on the statistical approach is, I believe 
>> retarding progress by failing to develop even the most basic resources 
>> needed for the analytical stage 2 function.
>
> I wholeheartedly agree.  But from a selfish point of view, that gives us a 
>competitive advantage.  We got a contract
>  with the US Dept. of Energy based on a competition with a dozen groups that 
>used their favorite methods of NLP.    (09)

   That is a great advantage when you can get the funding.   Live long, work 
hard, and succeed!!  I look forward to new revelations.    (010)

Pat    (011)


Patrick Cassidy
MICRA Inc.
cassidy@xxxxxxxxx
1-908-561-3416    (012)


-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx 
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John F Sowa
Sent: Sunday, August 04, 2013 1:28 PM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] Context and Inter-annotator agreement    (013)

Pat,    (014)

PC
> The point at issue is whether all of the senses of a particular word 
> needed for language understanding can be included in a semantic lexicon.
> My experience suggests that they can, even though new senses are being 
> developed all the time.  The new senses can also be included in the 
> lexicon, if they are important enough to warrant the effort.    (015)

That claim is vague enough to cover all bases.  If you want a project that 
includes all word senses anyone considers important, I suggest Wiktionary.  It 
has "3,476,017 entries with English definitions from over 500 languages":    (016)

    http://en.wiktionary.org/wiki/Wiktionary:Main_Page    (017)

Large numbers of people around are actively updating and extending Wiktionary.  
When the number of senses is in the millions and growing, it seems hard to 
claim that there is any finite upper limit.    (018)

PC
> JFS seems to be saying that failure of some groups to achieve a goal 
> means that no amount of effort trying a related but different way can 
> succeed    (019)

More precisely, the idea of selecting a small number of primitives for defining 
everything is one of the oldest in the history of philosophy, logic, 
linguistics, and AI.  It can be traced back at least to 500 BC with Pythagoras, 
Plato, and Aristotle.  For summaries and references, see 
http://www.jfsowa.com/talks/kdptut.pdf .    (020)

Slides 13 to 18:  Aristotle's categories, definitions, and the Tree
    of Porphyry for organizing them graphically.    (021)

Slides 91 to 93:  Universal language schemes in the 17th and 18th
    centuries.  John Wilkins developed the largest and most impressive
    set of primitives (40 genera subdivided in 2030 species).  Wilkins
    got help from other members to define 15,000 words in those terms.
    For more information about these and other schemes, see references
    by Knowlson (1975), Eco (1995), and Okrent (2009).    (022)

Slides 94 to 97:  Ramon Llull's Great Art (Ars Magna), which included
    Aristotle's categories, the Tree of Porphyry, rotating circles
    for combining categories, and a methodology for using them to
    answer questions.  Leibniz was inspired by Llull to encode the
    primitive categories in prime numbers and use multiplication
    to combine them and division to analyze them.    (023)

Slide 98:  Leibniz's method generated a lattice.  For modern
    lattice methods, see FCA and Ranganathan's facet classification.
    Click on the URLs to see FCA lattices that are automatically
    derived from WordNet and from Roget's Thesaurus.    (024)

Slides 99 to 101:  Categories by Kant and Peirce.  A suggested
    updated version of Wilkins' hierarchy that includes more
    modern developments.    (025)

Slides 102 to 107:  Issues about the possibility of ever having
    a complete, consistent, and finished ontology of everything.    (026)

For modern computational linguistics, the idea of selecting a set of primitives 
for defining everything was proposed and implemented in the late 1950s and 
early '60s:    (027)

1961 International Conf. on Machine Translation.  See the table
    of contents: http://www.mt-archive.info/NPL-1961-TOC.htm .
    At that conference, Margaret Masterman proposed a list of 100
    primitive concepts, which she used as the basis for lattices
    that combine them in all possible ways.  Yorick Wilks worked
    with Masterman and others at CLRU, and he continued to use
    her list of primitives for his later work in NLP.  For the
    list, see http://www.mt-archive.info/NPL-1961-Masterman.pdf    (028)

TINLAP (three conferences on Theoretical Issues in Natural Language
    Processing from 1975 to 1987).  The question of primitives was
    the focus of these conferences.  Yorick Wilks was one of the
    organizers.  Roger Schank (who also had a set of primitives for
    defining action verbs) was prominent in them.  For summaries,
    see http://www.aclweb.org/anthology-new/T/T78/T78-1000.pdf
    and http://www.aclweb.org/anthology-new/T/T87/T87-1001.pdf .    (029)

Anna Wierzbicka spent many years working on issues of selecting and
    using a proposed set of primitives for defining words in multiple
    languages.  From Wikipedia:  "She is especially known for Natural
    Semantic Metalanguage, particularly the concept of semantic primes.
    This is a research agenda resembling Leibniz's original "alphabet
    of human thought", which Wierzbicka credits her colleague, linguist
    Andrzej Bogusławski, with reviving in the late 1960s."  Many people
    tried to use her "semantic primes" in computational linguistics,
    but none of those projects were successful.    (030)

I never said "No amount of effort trying a related but different way can 
succeed."  In fact, I have been proposing and *using* related methods, but I 
always insist on keeping all options open.    (031)

There is no evidence that a fixed set exists, and an overwhelming amount of 
evidence that Zipf's Law holds:  there is an extremely long tail to the 
distribution of word senses.  But if you keep your options open and *if* a 
fixed set of primitives is sufficient, then you will discover that set.  That 
is my recommended strategy.    (032)

> So the statistical approach has become vastly more funded than the 
> ontological/analytical.    (033)

I certainly agree with you that a deeper analysis with ontologies and related 
lexical resources is essential for NL understanding.  I believe that 
statistical methods are useful as a *supplement* to the deeper
methods.   At VivoMind Research, we use *both*, but the emphasis is
on a syntactic and semantic analysis by symbolic methods.    (034)

> the current strong emphasis on the statistical approach is, I believe 
> retarding progress by failing to develop even the most basic resources 
> needed for the analytical stage 2 function.    (035)

I wholeheartedly agree.  But from a selfish point of view, that gives us a 
competitive advantage.  We got a contract with the US Dept. of Energy based on 
a competition with a dozen groups that used their favorite methods of NLP.    (036)

For the test, all competitors were asked to extract certain kinds of data from 
a set of research reports and present the results in a table.
The scores were determined by the number of correct answers.  Our score was 
96%.  The next best was 73%.  Third best was above 50%, and all the rest were 
below 50%.    (037)

For analyzing the documents, we used very general lexical resources and a 
fairly simple general ontology.  But we supplemented it with a detailed 
ontology that was specialized for chemical compounds, chemical formulas, and 
the related details of interest.    (038)


For an example of a spreadsheet with the results, see slides 49 & 50 of 
http://www.jfsowa.com/talks/relating.pdf .    (039)

John    (040)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/ Community Wiki: 
http://ontolog.cim3.net/wiki/ To join: 
http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (041)



_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (042)

<Prev in Thread] Current Thread [Next in Thread>