[Top] [All Lists]

Re: [ontolog-forum] Guo's word senses and Foundational Ontologies

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Sat, 30 May 2009 12:20:00 -0400
Message-id: <4A215CB0.2060601@xxxxxxxxxxx>
Pat, John B, Azamat, and Frank,    (01)

The notes in this thread have wandered all over the map in their
range of topics.  I'd like to comment on all of them in one note, 
because taken together they raise an important range of issues
that any ontology (or AI in general) must deal with.    (02)

First, I'd like to recommend Pat's slides for a good summary
of an approach to ontology based on primitives:
    http://www.micra.com/COSMO/TheFoundationOntologyForInteroperability.ppt    (03)

Although I like the slides as a summary of the approach, I still
have serious concerns about their assumptions:    (04)

  1. If you have N ontologies, the number of mappings from each to each
     is N^2.  But if you have a single universal ontology, you could
     reduce the total to 2N mappings of the universal ontology to and
     from each of the others.    (05)

  2. Many linguists, such as Anna Wierzbicka and others, have proposed
     or discovered a universal set of primitives that underlie all the
     world's languages.  Practical applications of primitives include
     Ogden's Basic English and the LDOCE list of defining terms.    (06)

  3. Therefore, a universal ontology based on a set of primitives
     derived from the linguistic R & D would be sufficient to define
     all the humanly conceivable concepts, and it would gain the
     advantage of reducing the possible mappings from N^2 to 2N.    (07)

The first assumption is true *only* for the formally defined systems
of mathematics, logic, and computer science in which each term has
precisely one meaning.  Even a slight variation in the meaning of a
single term can introduce inconsistencies that cause total collapse.    (08)

Another criticism of #1 is illustrated by the universal intermediate
languages (ILs) often used in multi-language compilers, such as the Gnu
compilers.  Those languages can be compiled to a common form because
they were *designed* to be compiled to a precisely defined machine code.
The IL is just a generic machine code that is somewhat more systematic
and regular than most popular computers.    (09)

But a serious problem arises when trying to use the IL in a universal
translator between source languages, say FORTRAN -> IL -> C++, or
FORTRAN 90 -> IL -> FORTRAN IV.    (010)

That kind of translation can be done for some simple expressions,
but serious difficulties arise in trying to support features of
C++ that are not present in FORTRAN or features that are similar,
but not identical in the two languages.    (011)

Even an expression like A+B creates problems because of all the
variations of data types in each of the languages.  For a simple
add of two integers, problems arise because of different ways
of handling overflow exceptions in the two languages.  An exact
translation of A+B to another language would have to supplement
the code with a library of error handling routines that would
accommodate all the variations in exception handling that are
different in the two languages.    (012)

Because of these issues, *nobody* uses the IL of the Gnu compilers
to do translations of any Gnu language to any of the others.
It is just not practical.    (013)

When you move from math & comp. sci. to natural languages, you have
to contend with the open ended range of meanings of *every* word
in the language.  Before getting into the primitives, I'd like to
mention the following paper about primitives, which Pat cited:    (014)

http://www.une.edu.au/bcss/linguistics/nsm/pdfs/bad-arguments5.pdf    (015)

This is a good paper, which presents a strong defense of primitives
in NLs, especially the kinds of primitives proposed by Wierzbicka.
I highly recommend it as a survey of the field and the various
issues that have been raised.  But I'd like to point out several
points that the author, Cliff Goddard, does not make:    (016)

  1. He admits that nobody has yet discovered an ideal set of
     primitives.  He notes that Wierzbicka's primitives are as
     good as any and better than most, but he does not suggest
     that the research has reached a final or even a stable
     universal set.    (017)

  2. Goddard, Wierzbicka, Ogden, LDOCE, and others *never* claim
     their primitives are as precisely defined as a mathematical
     theory or a programming language.  In fact, their examples
     show that their primitives are just as "squishy" -- i.e.,
     just as vague and fuzzy as any words in any of the languages
     they are trying to define.    (018)

  3. In various examples, Wierzbicka shows how similar words in
     different languages (such as English and Russian) expand
     into different definitions in terms of the primitives. Each
     of those definitions typically takes one or more sentences
     composed of primitives -- anywhere from a dozen to several
     dozen words.  If you expand a Russian text into primitives,
     the size expands by at least an order of magnitude, and it
     is extremely difficult or impossible to determine how to
     compress it into a smaller number of English words.    (019)

For these reasons, machine translation systems that expand one
language into a universal Interlingua have not been successful.
Many attempts have been made, but all of projects have been
canceled before any practical MT systems were produced.  For
ontology, there is even less experience:  nobody has even
attempted to build a universal Interlingua that is suitable
for translating one ontology to another.    (020)

For a survey of the issues about an Interlingua for MT, see the
following chapter from a book by John Hutchins:    (021)

    http://www.hutchinsweb.me.uk/IntroMT-6.pdf    (022)

In the summary at the end, Hutchins says "the 'conceptual meaning'
representations required for interlingua-based systems demand a
complexity of semantic analysis beyond the limitations of current
linguistic theory.  It is generally agreed that transfer-based
approaches are at present the best foundations for advances in MT."    (023)

In short, all attempts to develop a universal Interlingua for MT
have *failed*.  Linguists have retreated to the pairwise transfer
approach, which requires N^2 translators for N languages.    (024)

Doug Lenat>> The problems... are (a) there is no small set, and
 >> (b) it's almost impossible to nail down the meaning of most
 >> interesting terms, because of the inherent ambiguity in whatever
 >> set of terms are "primitive."    (025)

PC> This remark seems to be directed at "primitive terms" used in
 > language.  The kind of semantic primitives in an ontology are not
 > ambiguous, of course, so Lenat here is talking about human language.    (026)

The reason why a small set was adequate for Wierzbicka and others
is that they could take advantage of vagueness to cover a large range
of "microsenses" with a small number of primitives.    (027)

If you demand absolute precision, you need a distinct primitive
for each microsense, and Lenat's estimate of 15,000 primitives is
probably too small.  (His previous estimates about the number of
concepts and axioms needed for Cyc have always been too small.)    (028)

PC> My suggestion was that, rather than guess, we actually conduct
 > a proper study to determine whether there is a finite inventory
 > of conceptual primitives and if so what the number is.    (029)

I have no objection to that as a long-term research project.  It
might produce something useful.  But I wouldn't expect it to solve
the translation problems for a long, long time.    (030)

JB> I also wonder about the work being done at Renaissance Technology
 > with the use of the Chern-Simons algorithms used to extract patterns
 > from stream data.    (031)

Thanks for that reference.  It illustrates the advantage of statistical
methods and clever algorithms for certain kinds of problems.  Those
algorithms are complementary to ontology-based approaches, and it's
important to have methods for taking advantage of multiple paradigms.    (032)

AA> An academic geometer, James Simon, first had coauthored a
 > geometrical theory to be used for a quantum gravity string theory,
 > then left the academia to establish a hedge fund, Renaissance
 > Technologies Corporation, managing now up to $ 20 b, being 80
 > years old, and recently titled as "the smartest billionaire."
 > Statistics is still ruling the world...
 > Not Statistics but Ontology should rule the world.    (033)

I don't believe that the world should have a single ruling monarch
or dictator.  All previous attempts to establish one have been
unpleasant or worse.  Some religious leaders claim that God should
rule the world.  But in practice, that means that some finite
mortals who claim to know the infinite mind end up as dictators.    (034)

When it comes to ontology, I am willing to admit that there might
be a perfect ontology somewhere in the infinite lattice of all
possible theories.  But I seriously doubt that our finite minds
and machines will be able to discover it any time soon.    (035)

FK> Thinking is too fast and too rich in paths of associations
 > to be satisfied by snapshots of net shaped representations.    (036)

I would agree that any snapshot of human thinking, no matter what
shape it's mapped into, is likely to be superseded by a better one
fairly quickly.  Just look at all the daily patches that Microsoft
ships out for their software systems.    (037)

We might find fixed and frozen snapshots that are useful for
narrowly defined problems (such as an ontology for units of measure,
for example).  But a universal ontology of everything doesn't exist
today, and all the large attempts (such as Cyc) are undergoing
continuous revision and extension.    (038)

In summary, any foundation for ontology should accommodate continuous 
revision and update.  That is why I have recommended a hierarchy of
ontologies, not a single, fixed standard.  Let the users decide which,
if any, are appropriate for their problems.    (039)

John Sowa    (040)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (041)

<Prev in Thread] Current Thread [Next in Thread>