My recollection (don’t recall from where) when reading the
results of the HALO project was that one of the factors that reduced Cyc’s
efficacy was that they didn’t have as many chemistry specialists a the
others used to help formulate the knowledge and the questions. Another
factor was that they were told that the perspicuity of the explanations would
not be a factor in scoring, but it turned out that perspicuous explanations were
a significant factor in scoring, and Cyc’s explanations were
substantially less clear than the others. In a competition of this kind,
there are a very large number of variables that differ from one entry to
another. It is perfectly fine for a DoD bake-off when they need to know
which manufacturer can provide them with the best weapon starting next year,
but it says nothing useful about specific individual components of a large system.
In general, competitions of complex systems are a very far cry from a
scientific experiment where one changes one variable and sees how that affects
the behavior of a system. Changing ten thousand components at a time is guaranteed
to tell you nothing about the effects of any individual component, especially
in a system like an ontology where a small change in one component can have a
significant effect on performance. If you detect a difference in performance,
you cannot tell with any confidence what caused the difference. At best,
the lesson about Cyc from HALO is that having a lot of common sense won’t
make up for lack of expertise in specialized tasks. Duh. I thought
we learned that a few thousand years ago – though for computer systems
the same lesson was learned only about 30 years ago.
Another “lesson” from HALO is that the cost of programming
textbook knowledge by hand is about ten thousand dollars per page.
OK. But then they go out on a limb and say that that is too
expensive. What? That’s not right. If a college
education in engineering requires that a student learn a textbook per semester
per course, that would be 32 textbooks for a four-year, four-course per semester
education. At 500 pages each, that’s 16,000 pages – cost $160
million dollars to code. Less than we spend on a Mars probe that crashes
without sending back any data (a computer might have done better).
Sounds like a lot to educate one student, but now the knowledge can be
duplicated indefinitely, and we get trained engineers for the cost of one workstation
each. A bargain, I would say. But of course, that is
nonsense. Without the basic common sense, learning ability, and other
accoutrements of human intelligence even that system won’t perform at the
level of a trained engineer.
Getting the intelligence into a computer is *not* the
bottleneck. Knowing how to use the knowledge effectively *is*.
That’s the stage we are at with ontologies. We know how to
build ontologies, but still haven’t figured out how to use them effectively
for real-life purposes.
John Sowa suggested at one time that the best “challenge”
problem would be to reproduce the language capabilities of a 5-year old. I
agree, but I think that before it can become a “challenge” we still
need to develop some basic standards so that research will be reusable. To
begin, I believe that it is perfectly feasible to develop a common foundation
ontology to serve as a basic standard of meaning, provided that the project to develop
it is adequately funded – from 5 to 20 million dollars – so as to
include substantive input from a representative sample of the developers and
users of ontologies. Some standard for reasoning – perhaps IKL
– would also be needed. It doesn’t matter that some or even a
majority of potential users do no choose to use the standard(s), it is only
important that a large enough community use it to create an effective research
community and encourage third-party vendors to supply utilities to make the standard
easier to use. Language is sufficiently modular that, given a common
standard of meaning and common reasoning method, the results of research will
be highly reusable and a language-understanding system that approaches human level
can begin to evolve within that community. At some late point in
development, a “challenge” may be useful.
On Behalf Of Pat Hayes
Sent: Wednesday, February 27, 2008 3:29 PM
Subject: Re: [ontolog-forum] Search engine for the ontology
At 3:17 PM -0500 2/27/08, John F. Sowa wrote:
That property is only true of toy ontologies that are unable
to deal with problems that anyone would actually pay to solve:
AA> The uniqueness of ontology is that it is a single concept
> scheme uniformly covering all things in the world.
Cyc is the largest formal ontology that has ever been implemented,
and after the first five years (from 1984 to 1989), they had to
supplement their uniform upper-level ontology with a large and
growing number of "microtheories" for the various specializations.
Just to emphasize: in Cyc, 'microtheories' are now a central
methodology, precisely because there is no "single concept scheme covering
all things", but instead, concepts must be 'tuned' or 'fitted' to
particular contexts or tasks or topics. The concept of nucleus for
example has a lot in common between biology, atomic physics, astronomy and
linguistics, but all these fields use the term with different exact meanings
and emphases. Simply distinguishing bio-nucleus from physics-nucleus, etc., as
distinct concepts, loses the commonality; treating them all as instances of one
super-concept leads to confusion and inconsistency.
After 20 years (around 2004), they had defined an ontology
about 600,000 categories, 2 million axioms, and 6,000 microtheories.
Then there was the HALO project, sponsored by Bill Gates's former
buddy, Paul Allen. For that project, three groups -- Cycorp,
OntoPrise, and SRI International -- were given the task of
representing some pages from a chemistry textbook in their
favorite notation and solving some problems that would be
typical of a college freshman course in chemistry.
Despite the fact that Cyc had a much larger knowledge base than
the other two groups, it did not help them. The average cost
for all three groups to translate the text into their notation
was about the same -- $10,000 per page. The average score on
the exam was about 40% to 47% correct, and Cyc had the lowest
We (a university-based consortium led by SRI) also beat Cyc
in the RKF competition, largely because we used a graphical human interface
rather then a text-based one. Natural language, it turns out, is a lousy way to
communicate with a computer.
(850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202
(850)202 4440 fax
(850)291 0667 cell
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (01)