[Top] [All Lists]

[ontolog-forum] Project HALO lessons - was Search engine for the ontolog

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Patrick Cassidy" <pat@xxxxxxxxx>
Date: Wed, 27 Feb 2008 20:20:10 -0500
Message-id: <0f2a01c879a8$0ff6fa90$2fe4efb0$@com>

My recollection (don’t recall from where) when reading the results of the HALO project was that one of the factors that reduced Cyc’s efficacy was that they didn’t have as many chemistry specialists a the others used to help formulate the knowledge and the questions.  Another factor was that they were told that the perspicuity of the explanations would not be a factor in scoring, but it turned out that perspicuous explanations were a significant factor in scoring, and Cyc’s explanations were substantially less clear than the others.  In a competition of this kind, there are a very large number of variables that differ from one entry to another.  It is perfectly fine for a DoD bake-off when they need to know which manufacturer can provide them with the best weapon starting next year, but it says nothing useful about specific individual components of a large system.  In general, competitions of complex systems are a very far cry from a scientific experiment where one changes one variable and sees how that affects the behavior of a system.  Changing ten thousand components at a time is guaranteed to tell you nothing about the effects of any individual component, especially in a system like an ontology where a small change in one component can have a significant effect on performance.  If you detect a difference in performance, you cannot tell with any confidence what caused the difference.  At best, the lesson about Cyc from HALO is that having a lot of common sense won’t make up for lack of expertise in specialized tasks.  Duh.  I thought we learned that a few thousand years ago – though for computer systems the same lesson was learned only about 30 years ago.


Another “lesson” from HALO is that the cost of programming textbook knowledge by hand is about ten thousand dollars per page.  OK.  But then they go out on a limb and say that that is too expensive.  What?  That’s not right.  If a college education in engineering requires that a student learn a textbook per semester per course, that would be 32 textbooks for a four-year, four-course per semester education.  At 500 pages each, that’s 16,000 pages – cost $160 million dollars to code.  Less than we spend on a Mars probe that crashes without sending back any data (a computer might have done better).   Sounds like a lot to educate one student, but now the knowledge can be duplicated indefinitely, and we get trained engineers for the cost of one workstation each.  A bargain, I would say.  But of course, that is nonsense.  Without the basic common sense, learning ability, and other accoutrements of human intelligence even that system won’t perform at the level of a trained engineer.


Getting the intelligence into a computer is *not* the bottleneck.  Knowing how to use the knowledge effectively *is*.   That’s the stage we are at with ontologies.  We know how to build ontologies, but still haven’t figured out how to use them effectively for real-life purposes.


John Sowa suggested at one time that the best “challenge” problem would be to reproduce the language capabilities of a 5-year old.  I agree, but I think that before it can become a “challenge” we still need to develop some basic standards so that research will be reusable.  To begin, I believe that it is perfectly feasible to develop a common foundation ontology to serve as a basic standard of meaning, provided that the project to develop it is adequately funded – from 5 to 20 million dollars – so as to include substantive input from a representative sample of the developers and users of ontologies.   Some standard for reasoning – perhaps IKL – would also be needed.  It doesn’t matter that some or even a majority of potential users do no choose to use the standard(s), it is only important that a large enough community use it to create an effective research community and encourage third-party vendors to supply utilities to make the standard easier to use.   Language is sufficiently modular that, given a common standard of meaning and common reasoning method, the results of research will be highly reusable and a language-understanding system that approaches human level can begin to evolve within that community.  At some late point in development, a “challenge” may be useful.




Patrick Cassidy



cell: 908-565-4053



From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Pat Hayes
Sent: Wednesday, February 27, 2008 3:29 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Search engine for the ontology


At 3:17 PM -0500 2/27/08, John F. Sowa wrote:


That property is only true of toy ontologies that are unable
to deal with problems that anyone would actually pay to solve:

AA> The uniqueness of ontology is that it is a single concept
 > scheme uniformly covering all things in the world.

Cyc is the largest formal ontology that has ever been implemented,
and after the first five years (from 1984 to 1989), they had to
supplement their uniform upper-level ontology with a large and
growing number of "microtheories" for the various specializations.


Just to emphasize: in Cyc, 'microtheories' are now a central methodology, precisely because there is no "single concept scheme covering all things", but instead, concepts must be 'tuned' or 'fitted' to particular contexts or tasks or topics. The concept of nucleus for example has a lot in common between biology, atomic physics, astronomy and linguistics, but all these fields use the term with different exact meanings and emphases. Simply distinguishing bio-nucleus from physics-nucleus, etc., as distinct concepts, loses the commonality; treating them all as instances of one super-concept leads to confusion and inconsistency.


After 20 years (around 2004), they had defined an ontology with
about 600,000 categories, 2 million axioms, and 6,000 microtheories.

Then there was the HALO project, sponsored by Bill Gates's former
buddy, Paul Allen.  For that project, three groups -- Cycorp,
OntoPrise, and SRI International -- were given the task of
representing some pages from a chemistry textbook in their
favorite notation and solving some problems that would be
typical of a college freshman course in chemistry.

Despite the fact that Cyc had a much larger knowledge base than
the other two groups, it did not help them.  The average cost
for all three groups to translate the text into their notation
was about the same -- $10,000 per page.  The average score on
the exam was about 40% to 47% correct, and Cyc had the lowest



We (a university-based consortium led by SRI) also beat Cyc in the RKF competition, largely because we used a graphical human interface rather then a text-based one. Natural language, it turns out, is a lousy way to communicate with a computer.






IHMC               (850)434 8903 or (650)494 3973   home
40 South Alcaniz St.       (850)202 4416   office
Pensacola                 (850)202 4440   fax
FL 32502                     (850)291 0667    cell
http://www.ihmc.us/users/phayes      phayesAT-SIGNihmc.us

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>