Re: [ontolog-forum] Axiomatic ontology

 John F. Sowa wrote:

> There is a very big difference between searching the WWW
> for an arbitrary triple and searching for an exact match.

Indeed. That is why I wrote quoted strings for some of the search results and tuples for the others.

> If you check Google for a quoted string, you get:
>
>   "A tulip is a flower"     12 hits
>
>   "A flower is a plant"     44 hits
>
>   "A flower is a skunk"      0 hits
>
>   "Flower is a skunk"       11 hits

John is employing some filter on the Google results. When I ask for matches on "flower is a skunk", it reported 1100 hits, not 11. For "tulip is a flower" it reported 6070 hits, not 12, and for "a tulip is a flower" 3600 hits.

> That is not bad at all.

Nor is it good for anything. Is it clear that "a tulip is a flower" means "every tulip is a flower"? I seem to recall that 19th century logicians spent a great deal of time arguing about this. A search on "tulips are flowers" gets only 724 hits, while "flowers are tulips" gets 2700! Note also that one gets only 1 hit on "every tulip is a flower", and that is from a logic text. "All tulips are flowers" gets 8 hits, but "all flowers are tulips" gets 6! Out of context, the Web is seriously misinformed (to say nothing of in-context misinformation promulgated by the ignorant).

I admit that "flower is a skunk" was a ringer. I chose it precisely to demonstrate that the symbol "flower" has more than one well-understood meaning, and taking any bald statement about "flower" out of context makes it very difficult to determine which of two almost totally unrelated meanings was intended. I should think that these concepts could, however, be easily separated by examining the use of "flower" with other concepts related to flowers (of the plant kind).

But there are much more difficult cases. "A flower is a plant" and "A flower is a part of a plant" demand importantly different definitions of the term "flower". But they will appear so commonly with most of the related terms that any naive algorithm for extracting the distinction from an arbitrary Web-based text corpus will become very confused.

Avril asserted that one can get an ontology this way. If "ontology" is weakened to mean "a statistical association of a body of terms", then what she says is true. We can conclude "flower" is somehow related to "plant", and "tulip" and "stem" and "leaf" often occur in the same contexts. But we can never get the distinction between "flower" the verb that applies to plants, "flower" the botanical part, and "flower" the synonym for "flowering plant" from a simple association mechanism. And the vagaries of natural language usage, and the inaccuracies in phraseology on the Web, together with the proximity in meaning of these terms, are such that one cannot reliably refine that set of associations to a formal ontology suitable for automated reasoning.

Now, as I said, IF you restrict the text corpora to a set of known to be careful expositions, I think you might be able to extract an ontology from them. And the work I have seen in this area seems primarily to be directed to that approach. You may well be able to construct an axiomatic ontology from a textbook on nuclear physics or French literature. But the value of that ontology is probably directly proportional to the degree to which that textbook is an accepted reference, or at any rate matches common doctrine in that field. And you can't determine that from the textbook itself, although it may well have been determined by the human agent who elected to use that textbook to construct the ontology.

-Ed

"The trick is to glean from an experience exactly the knowledge that is contained in it. A cat which sits down on a hot stove will never do it again, but it will never sit on a cold stove again either." -- Mark Twain

--
Edward J. Barkmeyer                     Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                 Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                 FAX: +1 301-975-4694

"The opinions expressed above do not reflect consensus of NIST,
 and have not been reviewed by any Government authority."
