I have to agree with PatH on this issue – which I
had supposed was fairly obvious to anyone who is trying to extract information
from text.
There may actually be, somewhere on the Web, similar
content to what is expressed in painstaking accurate detail in Cyc, but as PatH
emphasizes – how will one recognized it and how will one use it?? And
how will one distinguish fact from nonsense?
Saying that there is a lot of information to be had
on the Web is like saying that there are tons of gold in the ocean, and all one
has to do is go out there in your boat and get it.
Right.
The reason Google is useful is that some smart
people have figured out a way for other smart people to do a search that
eliminates most (but by no means all) of the noise in returning documents.
But that only works automatically if you have a machine that can understand
language as well as a college graduate. If you find one, do let us know
about it.
Build an ontology from the Web? Well, sure, if you
have a machine that understands language as well as a Ph.D. and has the
equivalent expertise of an experienced ontologist. Duh.
If anyone has actually seen an ontology of any kind
extracted from the Web that is actually useful for something, please let us
know. Saying that it is in theory possible tells us nothing.
Pat
Patrick Cassidy
MICRA, Inc.
908-561-3416
cell: 908-565-4053
cassidy@xxxxxxxxx
From:
ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Pat Hayes
Sent: Friday, February 01, 2008 2:06 AM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Axiomatic ontology
At 12:51 AM -0500 2/1/08, John F. Sowa wrote:
Pat,
The amount of "common sense" info on the WWW dwarfs Cyc by many
orders of magnitude.
Nonsense. But go ahead, prove me wrong: find some. I'd be
delighted if you could.
PH> Remember, common sense isn't encyclopedia knowledge:
its
> the 'obvious' stuff that any child knows about the way the
> world works, such as: things fall if they aren't supported;
> roofs protect you from rain unless it is very windy; if you
> fall into a liquid you will get wet; parties are events which
> people go to in order to socialize and have fun, and often
> though not always celebrate an event such as a birthday;
> objects are made of kinds of stuff; Jello is a kind of stuff;
> hitting something fragile is likely to break it if you hit it
> hard enough; being in a cold space makes you feel cold; you
> can walk on sand but it can also be poured, if it is dry;
> being covered with paint is similar in some ways to being
> covered with hair, but not in all ways; people like eating
> sugar; animals die if they stop eating or stop drinking; if
> you are in a room in a building then you are in the building,
> but if you are in a ship floating in water, you are not in
> the water; and so on and on and on.
It's not stated in a list of that sort, but it's implicit in
the way those words are being used.
Well, yes that is true, in a sense; but because its
'implicit' doesn't mean it can be extracted by any algorithm. All claims like
this are just noise until someone actually finds a way to extract such
information. I know a lot of people are trying very hard, but nobody (including
you and Arun) have so far managed to actually do any of this. I don't
think most of the required content is even implicitly on the Web. Find
me, anywhere on the Web other than in Cyc, an account of the different senses
of 'cover' used in:
Or talk to a linguist about the many senses of
"in" used in English (approximately 30, though it is hard to be
exact), which require an ontology to be used in order to disambiguate
them.
You can test that claim
by a crude, but simple use of Google.
For example:
"parties are events which people go to in order to
socialize and
have fun, and often though not always celebrate an event
such as
a birthday"
Check pairs of words from that sentence on Google, look at the count
of hits, and check a few of the pages in the top hits:
party event -- 29,100,000 hits
party people -- 39,500,000
party socialize -- 1,280,000
party fun -- 34,000,000
party celebrate -- 14,200,000
This tells one nothing more than that the words are
associated. That is not enough to state a coherent proposition, let alone a
coherent piece of ontological content.
This is an enormous amount of data about how parties are
related
to those other things.
Really? How does one extract information about
relationships from free text or word associations? Associations, remember,
are symmetrical.
Furthermore, much more detailed information
about how they're related can be found in the top few pages.
I certainly do not recommend that one should use the Google method
of indexing to find the related information. What I would recommend
is the method of indexing described in the following paper by Arun
and me:
http://www.jfsowa.com/pubs/analog.htm
Analogical Reasoning
I don't believe that the information has to wait until somebody
"tags" it with XML tags. Tags are fine for various formal
purposes.
But for commonsense info, I would rather start with the raw data,
as it is stated by people talking or writing for other people.
As Ive already pointed out, most common sense is never said
or written to other people. And even for that which has been, I don't accept
that random free NL text is 'the raw data' for common sense. Even if we could
extract formalizable content from free text, the text does not tell us how to
represent it in an ontology (just think, for example, of all the alternative
ways of formalizing temporal relationships which are expressed by English
tenses.) But in any case, any claims like this are at best a theoretical idea
which has not yet been tested. Many very smart people are trying very hard to
extract useful information from the unstructured Web, and they havn't managed
to do very much of it yet. I don't know of ANY methods or projects (including
analogical structure matching, by the way, which is being used actively by
dozens of people at NorthWestern, where it was invented) which can be said to
reliably extract a single nontrivial ontological proposition from the entire
Web. If anyone reading this knows of one, by all means tell us about it. Not an
idea or a belief, but a working system.
John
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
--
---------------------------------------------------------------------
IHMC
(850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202
4416 office
Pensacola
(850)202 4440 fax
FL 32502
(850)291 0667 cell
phayesAT-SIGNihmc.us
http://www.ihmc.us/users/phayes
|
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (01)
|