[Top] [All Lists]

Re: [ontolog-forum] Axiomatic ontology

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Ed Barkmeyer <edbark@xxxxxxxx>
Date: Fri, 01 Feb 2008 11:51:53 -0500
Message-id: <47A34E29.4010409@xxxxxxxx>
Avril Styrman wrote:    (01)

>>> A large number of simple "X is Y" NL sentences gives a quite
>>> good approximation that "X is a subclass of Y"    (02)

And on urging from Pat Hayes, viz.:
>> OK, if you believe this, find me some. Actual examples from the Web.    (03)

Avril found:
> 6 040  hits "tulip is a flower".
> 56 200 hits "flower is a plant".
> plant > flover > tulip    (04)

There are also 1130 hits on "flower is a skunk".    (05)

Ergo??? skunk > flower > tulip ???    (06)

And there are 646,000 hits for the tuple (flower, is, skunk), againts 
14,000,000 for "skunk" alone, which, per John Sowa's approach, tells us 
that skunks are related to flowers somehow, although perhaps 3 times 
less often than parties are related to celebrations.    (07)

> In course of objectivity, we also get:
> 67 000 hits "plant is a flower"
> 4 100  hits "flower is a tulip"
> tulip > flover > plant  
> The point is, that we do get all these very
> easily, and after we have got them, we can 
> reason about them based on their relations.     (08)

I am curious to know what system of reasoning Avril plans to employ.
(Well, actually, I'm not that curious. :-|)    (09)

Based on the apparently valid "relations":  flower is a subcategory of 
plant (57,000 assertions) and plant is a subcategory of flower (64,000 
assertions), even Lofti Zadeh's fuzzy reasoning would conclude that 
there is a very high probability that the concepts "flower" and "plant" 
are identical.  Further, a search for "plant that is not a flower" 
produced NO hits.  This seems to raise the probability to near 
certainty.  Can we therefore conclude that there are no plants that are 
not flowers?  On the other hand, we get 130,000 hits on (plant, "not 
flower").  What shall we make of that?    (010)

I fully agree with Pat that the "unguided" analysis of unstructured text 
on the Web (or anything other than a carefully selected corpus) has not 
produced, nor is it likely ever to produce, anything useful as an 
"ontology".    (011)

We must not, however, confuse this with the very good and effective work 
on knowledge acquisition from unstructured text that is "guided" by a 
reference ontology.  That approach provides the search engine with a 
"starter ontology" that defines the principal concepts and relationships 
in the domain and then extends that knowledge base (by Bayesian 
analysis) by extracting and interpreting natural language from a broader 
corpus.  Part of that process is to discard documents that don't seem to 
be consistent with, or closely related to, the reference ontology.  In 
this way, the reference ontology provides a means for deciding whether a 
given use of a term is likely to have the same sense as the use in the 
ontology.  (And it doesn't involve an explicit concept of "context" -- 
the context is the other concepts and relationships in the reference 
ontology.)    (012)

So, if we start with an ontology that tells the engine that a flower is 
a part of a plant, and not necessarily of every plant, and that a skunk 
is an animal, and that the intersection of plant and animal is empty, we 
will get much better results from the examination of the corpus 
delivered by the Google searches above.    (013)

Of course, if we start with a Creationist ontology, our engine will also 
reject all documents about "evolution" that reflect the Darwinian 
theory.  And similar things will happen to conflicting scientific 
theories of astrophysics.  The guided engine will treat the reference 
ontology as "divine revelation".  But we have to crawl before we walk. 
We have by no means perfected even the guided analysis techniques. 
After we are comfortable with understanding natural language text based 
on reference ontologies, we can start dealing with belief, evidence, 
argument, and contradiction.  The problem with raw analysis of the Web 
as a corpus is that we are immediately confronted with all of those, and 
the thousand other shocks that natural language is heir to.    (014)

-Ed    (015)

Edward J. Barkmeyer                        Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                FAX: +1 301-975-4694    (016)

"The opinions expressed above do not reflect consensus of NIST,
  and have not been reviewed by any Government authority."    (017)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (018)

<Prev in Thread] Current Thread [Next in Thread>