Re: [ontolog-forum] Axiomatic ontology

 Ed Barkmeyer Fri, 01 Feb 2008
 Avril Styrman wrote:    (01)

>>> A large number of simple "X is Y" NL sentences gives a quite
>>> good approximation that "X is a subclass of Y"    (02)

And on urging from Pat Hayes, viz.:
>> OK, if you believe this, find me some. Actual examples from the Web.    (03)

Avril found:
> 6 040 hits "tulip is a flower".
> 56 200 hits "flower is a plant".
> plant > flover > tulip    (04)

There are also 1130 hits on "flower is a skunk".    (05)

Ergo??? skunk > flower > tulip ???    (06)

And there are 646,000 hits for the tuple (flower, is, skunk), againts
14,000,000 for "skunk" alone, which, per John Sowa's approach, tells us
that skunks are related to flowers somehow, although perhaps 3 times
less often than parties are related to celebrations.    (07)

> In course of objectivity, we also get:
>
> 67 000 hits "plant is a flower"
> 4 100 hits "flower is a tulip"
> tulip > flover > plant
>
> The point is, that we do get all these very
> easily, and after we have got them, we can
> reason about them based on their relations.    (08)

I am curious to know what system of reasoning Avril plans to employ.
(Well, actually, I'm not that curious. :-|)    (09)

Based on the apparently valid "relations": flower is a subcategory of
plant (57,000 assertions) and plant is a subcategory of flower (64,000
assertions), even Lofti Zadeh's fuzzy reasoning would conclude that
there is a very high probability that the concepts "flower" and "plant"
are identical.  Further, a search for "plant that is not a flower"
produced NO hits.  This seems to raise the probability to near
certainty.  Can we therefore conclude that there are no plants that are
not flowers?  On the other hand, we get 130,000 hits on (plant, "not
flower").  What shall we make of that?    (010)

I fully agree with Pat that the "unguided" analysis of unstructured
text on the Web (or anything other than a carefully selected corpus)
has not produced, nor is it likely ever to produce, anything useful as
an "ontology".    (011)

We must not, however, confuse this with the very good and effective
work on knowledge acquisition from unstructured text that is "guided"
by a reference ontology.  That approach provides the search engine with
a "starter ontology" that defines the principal concepts and
relationships in the domain and then extends that knowledge base (by
Bayesian analysis) by extracting and interpreting natural language from
a broader corpus.  Part of that process is to discard documents that
don't seem to be consistent with, or closely related to, the reference
ontology.  In this way, the reference ontology provides a means for
deciding whether a given use of a term is likely to have the same sense
as the use in the ontology.  (And it doesn't involve an explicit
concept of "context" -- the context is the other concepts and
relationships in the reference ontology.)    (012)

So, if we start with an ontology that tells the engine that a flower is
a part of a plant, and not necessarily of every plant, and that a skunk
is an animal, and that the intersection of plant and animal is empty,
we will get much better results from the examination of the corpus
delivered by the Google searches above.    (013)

Of course, if we start with a Creationist ontology, our engine will
also reject all documents about "evolution" that reflect the Darwinian
theory.  And similar things will happen to conflicting scientific
theories of astrophysics.  The guided engine will treat the reference
ontology as "divine revelation".  But we have to crawl before we walk.
We have by no means perfected even the guided analysis techniques.
After we are comfortable with understanding natural language text based
on reference ontologies, we can start dealing with belief, evidence,
argument, and contradiction.  The problem with raw analysis of the Web
as a corpus is that we are immediately confronted with all of those,
and the thousand other shocks that natural language is heir to.    (014)

-Ed    (015)

--
Edward J. Barkmeyer                        Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                FAX: +1 301-975-4694

"The opinions expressed above do not reflect consensus of NIST,
  and have not been reviewed by any Government authority."    (016)
