Avril Styrman wrote: (01)
>>> A large number of simple "X is Y" NL sentences gives a quite
>>> good approximation that "X is a subclass of Y" (02)
And on urging from Pat Hayes, viz.:
>> OK, if you believe this, find me some. Actual examples from the Web. (03)
Avril found:
> 6 040 hits "tulip is a flower".
> 56 200 hits "flower is a plant".
> plant > flover > tulip (04)
There are also 1130 hits on "flower is a skunk". (05)
Ergo??? skunk > flower > tulip ??? (06)
And there are 646,000 hits for the tuple (flower, is, skunk), againts
14,000,000 for "skunk" alone, which, per John Sowa's approach, tells us
that skunks are related to flowers somehow, although perhaps 3 times
less often than parties are related to celebrations. (07)
> In course of objectivity, we also get:
>
> 67 000 hits "plant is a flower"
> 4 100 hits "flower is a tulip"
> tulip > flover > plant
>
> The point is, that we do get all these very
> easily, and after we have got them, we can
> reason about them based on their relations. (08)
I am curious to know what system of reasoning Avril plans to employ.
(Well, actually, I'm not that curious. :-|) (09)
Based on the apparently valid "relations": flower is a subcategory of
plant (57,000 assertions) and plant is a subcategory of flower (64,000
assertions), even Lofti Zadeh's fuzzy reasoning would conclude that
there is a very high probability that the concepts "flower" and "plant"
are identical. Further, a search for "plant that is not a flower"
produced NO hits. This seems to raise the probability to near
certainty. Can we therefore conclude that there are no plants that are
not flowers? On the other hand, we get 130,000 hits on (plant, "not
flower"). What shall we make of that? (010)
I fully agree with Pat that the "unguided" analysis of unstructured text
on the Web (or anything other than a carefully selected corpus) has not
produced, nor is it likely ever to produce, anything useful as an
"ontology". (011)
We must not, however, confuse this with the very good and effective work
on knowledge acquisition from unstructured text that is "guided" by a
reference ontology. That approach provides the search engine with a
"starter ontology" that defines the principal concepts and relationships
in the domain and then extends that knowledge base (by Bayesian
analysis) by extracting and interpreting natural language from a broader
corpus. Part of that process is to discard documents that don't seem to
be consistent with, or closely related to, the reference ontology. In
this way, the reference ontology provides a means for deciding whether a
given use of a term is likely to have the same sense as the use in the
ontology. (And it doesn't involve an explicit concept of "context" --
the context is the other concepts and relationships in the reference
ontology.) (012)
So, if we start with an ontology that tells the engine that a flower is
a part of a plant, and not necessarily of every plant, and that a skunk
is an animal, and that the intersection of plant and animal is empty, we
will get much better results from the examination of the corpus
delivered by the Google searches above. (013)
Of course, if we start with a Creationist ontology, our engine will also
reject all documents about "evolution" that reflect the Darwinian
theory. And similar things will happen to conflicting scientific
theories of astrophysics. The guided engine will treat the reference
ontology as "divine revelation". But we have to crawl before we walk.
We have by no means perfected even the guided analysis techniques.
After we are comfortable with understanding natural language text based
on reference ontologies, we can start dealing with belief, evidence,
argument, and contradiction. The problem with raw analysis of the Web
as a corpus is that we are immediately confronted with all of those, and
the thousand other shocks that natural language is heir to. (014)
-Ed (015)
--
Edward J. Barkmeyer Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263 Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263 FAX: +1 301-975-4694 (016)
"The opinions expressed above do not reflect consensus of NIST,
and have not been reviewed by any Government authority." (017)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (018)
|