[Top] [All Lists]

Re: [ontolog-forum] Axiomatic ontology

To: "John F. Sowa" <sowa@xxxxxxxxxxx>
Cc: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Ed Barkmeyer <edbark@xxxxxxxx>
Date: Fri, 01 Feb 2008 17:21:47 -0500
Message-id: <47A39B7B.7010603@xxxxxxxx>
John F. Sowa wrote:    (01)

> There is a very big difference between searching the WWW
> for an arbitrary triple and searching for an exact match.    (02)

Indeed. That is why I wrote quoted strings for some of the search 
results and tuples for the others.    (03)

> If you check Google for a quoted string, you get:
>    "A tulip is a flower"    12 hits
>    "A flower is a plant"    44 hits
>    "A flower is a skunk"     0 hits
>    "Flower is a skunk"      11 hits    (04)

John is employing some filter on the Google results.  When I ask for 
matches on "flower is a skunk", it reported 1100 hits, not 11.  For 
"tulip is a flower" it reported 6070 hits, not 12, and for "a tulip is a 
flower" 3600 hits.    (05)

> That is not bad at all.    (06)

Nor is it good for anything.  Is it clear that "a tulip is a flower" 
means "every tulip is a flower"?  I seem to recall that 19th century 
logicians spent a great deal of time arguing about this.  A search on 
"tulips are flowers" gets only 724 hits, while "flowers are tulips" gets 
2700!  Note also that one gets only 1 hit on "every tulip is a flower", 
and that is from a logic text.  "All tulips are flowers" gets 8 hits, 
but "all flowers are tulips" gets 6!  Out of context, the Web is 
seriously misinformed (to say nothing of in-context misinformation 
promulgated by the ignorant).    (07)

I admit that "flower is a skunk" was a ringer.  I chose it precisely to 
demonstrate that the symbol "flower" has more than one well-understood 
meaning, and taking any bald statement about "flower" out of context 
makes it very difficult to determine which of two almost totally 
unrelated meanings was intended.  I should think that these concepts 
could, however, be easily separated by examining the use of "flower" 
with other concepts related to flowers (of the plant kind).    (08)

But there are much more difficult cases. "A flower is a plant" and "A 
flower is a part of a plant" demand importantly different definitions of 
the term "flower".  But they will appear so commonly with most of the 
related terms that any naive algorithm for extracting the distinction 
from an arbitrary Web-based text corpus will become very confused.    (09)

Avril asserted that one can get an ontology this way.  If "ontology" is 
weakened to mean "a statistical association of a body of terms", then 
what she says is true.  We can conclude "flower" is somehow related to 
"plant", and "tulip" and "stem" and "leaf" often occur in the same 
contexts.  But we can never get the distinction between "flower" the 
verb that applies to plants, "flower" the botanical part, and "flower" 
the synonym for "flowering plant" from a simple association mechanism. 
And the vagaries of natural language usage, and the inaccuracies in 
phraseology on the Web, together with the proximity in meaning of these 
terms, are such that one cannot reliably refine that set of associations 
to a formal ontology suitable for automated reasoning.    (010)

Now, as I said, IF you restrict the text corpora to a set of known to be 
careful expositions, I think you might be able to extract an ontology 
from them.  And the work I have seen in this area seems primarily to be 
directed to that approach.  You may well be able to construct an 
axiomatic ontology from a textbook on nuclear physics or French 
literature.  But the value of that ontology is probably directly 
proportional to the degree to which that textbook is an accepted 
reference, or at any rate matches common doctrine in that field.  And 
you can't determine that from the textbook itself, although it may well 
have been determined by the human agent who elected to use that textbook 
to construct the ontology.    (011)

-Ed    (012)

"The trick is to glean from an experience exactly the knowledge that is
contained in it.  A cat which sits down on a hot stove will never do it
again, but it will never sit on a cold stove again either."
   -- Mark Twain    (013)

Edward J. Barkmeyer                        Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                FAX: +1 301-975-4694    (014)

"The opinions expressed above do not reflect consensus of NIST,
  and have not been reviewed by any Government authority."    (015)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (016)

<Prev in Thread] Current Thread [Next in Thread>