John F. Sowa wrote: (01)
> There is a very big difference between searching the WWW
> for an arbitrary triple and searching for an exact match. (02)
Indeed. That is why I wrote quoted strings for some of the search
results and tuples for the others. (03)
> If you check Google for a quoted string, you get:
>
> "A tulip is a flower" 12 hits
>
> "A flower is a plant" 44 hits
>
> "A flower is a skunk" 0 hits
>
> "Flower is a skunk" 11 hits (04)
John is employing some filter on the Google results. When I ask for
matches on "flower is a skunk", it reported 1100 hits, not 11. For
"tulip is a flower" it reported 6070 hits, not 12, and for "a tulip is a
flower" 3600 hits. (05)
> That is not bad at all. (06)
Nor is it good for anything. Is it clear that "a tulip is a flower"
means "every tulip is a flower"? I seem to recall that 19th century
logicians spent a great deal of time arguing about this. A search on
"tulips are flowers" gets only 724 hits, while "flowers are tulips" gets
2700! Note also that one gets only 1 hit on "every tulip is a flower",
and that is from a logic text. "All tulips are flowers" gets 8 hits,
but "all flowers are tulips" gets 6! Out of context, the Web is
seriously misinformed (to say nothing of in-context misinformation
promulgated by the ignorant). (07)
I admit that "flower is a skunk" was a ringer. I chose it precisely to
demonstrate that the symbol "flower" has more than one well-understood
meaning, and taking any bald statement about "flower" out of context
makes it very difficult to determine which of two almost totally
unrelated meanings was intended. I should think that these concepts
could, however, be easily separated by examining the use of "flower"
with other concepts related to flowers (of the plant kind). (08)
But there are much more difficult cases. "A flower is a plant" and "A
flower is a part of a plant" demand importantly different definitions of
the term "flower". But they will appear so commonly with most of the
related terms that any naive algorithm for extracting the distinction
from an arbitrary Web-based text corpus will become very confused. (09)
Avril asserted that one can get an ontology this way. If "ontology" is
weakened to mean "a statistical association of a body of terms", then
what she says is true. We can conclude "flower" is somehow related to
"plant", and "tulip" and "stem" and "leaf" often occur in the same
contexts. But we can never get the distinction between "flower" the
verb that applies to plants, "flower" the botanical part, and "flower"
the synonym for "flowering plant" from a simple association mechanism.
And the vagaries of natural language usage, and the inaccuracies in
phraseology on the Web, together with the proximity in meaning of these
terms, are such that one cannot reliably refine that set of associations
to a formal ontology suitable for automated reasoning. (010)
Now, as I said, IF you restrict the text corpora to a set of known to be
careful expositions, I think you might be able to extract an ontology
from them. And the work I have seen in this area seems primarily to be
directed to that approach. You may well be able to construct an
axiomatic ontology from a textbook on nuclear physics or French
literature. But the value of that ontology is probably directly
proportional to the degree to which that textbook is an accepted
reference, or at any rate matches common doctrine in that field. And
you can't determine that from the textbook itself, although it may well
have been determined by the human agent who elected to use that textbook
to construct the ontology. (011)
-Ed (012)
"The trick is to glean from an experience exactly the knowledge that is
contained in it. A cat which sits down on a hot stove will never do it
again, but it will never sit on a cold stove again either."
-- Mark Twain (013)
--
Edward J. Barkmeyer Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263 Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263 FAX: +1 301-975-4694 (014)
"The opinions expressed above do not reflect consensus of NIST,
and have not been reviewed by any Government authority." (015)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (016)
|