This mail is publicly posted to a distribution list as part of a process
of public discussion, any automatically generated statements to the
contrary non-withstanding. It is the opinion of the author, and does not
represent an official company view. (01)
I would like to agree with Pat on this. According to Google, I am most
likely to be a basketball player or a science fiction character.
Conversely, it is also most unlikely that anyone except me could
accurately identify which of the 231,000 hits relates to me (I'm as vain
as anyone). (02)
More pertinently, from the tried and tested viewpoint of industrial data
exchange, we cannot rely on words meaning the same thing even inside a
single company. Even where we see the same words used in similar ways,
we find the instance sets have a non-empty difference ((A Union B) - (A
Intersect B)), that is, people use words in different ways. (03)
Generating ontologies from the Web seems to me to be useful only for low
risk situations, that is one where the (impact of a mistake) times (the
probability of mistaking a term) is low cost with respect to one's
budget. It may be OK for ordering pizzas, but I wouldn't use it for
buying aircraft parts. (04)
Sean Barker
BAE SYSTEMS - Advanced Technology Centre
Bristol, UK
+44(0) 117 302 8184 (05)
BAE Systems (Operations) Limited
Registered Office: Warwick House, PO Box 87, Farnborough Aerospace
Centre, Farnborough, Hants, GU14 6YU, UK
Registered in England & Wales No: 1996687 (06)
________________________________ (07)
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Pat Hayes
Sent: 01 February 2008 07:06
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Axiomatic ontology (08)
*** WARNING *** (09)
This mail has originated outside your organization,
either from an external partner or the Global Internet.
Keep this in mind if you answer this message. (010)
At 12:51 AM -0500 2/1/08, John F. Sowa wrote: (011)
Pat, (012)
The amount of "common sense" info on the WWW dwarfs Cyc
by many (013)
orders of magnitude. (014)
Nonsense. But go ahead, prove me wrong: find some. I'd be
delighted if you could. (015)
PH> Remember, common sense isn't encyclopedia knowledge:
its
> the 'obvious' stuff that any child knows about the
way the
> world works, such as: things fall if they aren't
supported;
> roofs protect you from rain unless it is very windy;
if you
> fall into a liquid you will get wet; parties are
events which
> people go to in order to socialize and have fun, and
often
> though not always celebrate an event such as a
birthday;
> objects are made of kinds of stuff; Jello is a kind
of stuff;
> hitting something fragile is likely to break it if
you hit it
> hard enough; being in a cold space makes you feel
cold; you
> can walk on sand but it can also be poured, if it is
dry;
> being covered with paint is similar in some ways to
being
> covered with hair, but not in all ways; people like
eating
> sugar; animals die if they stop eating or stop
drinking; if
> you are in a room in a building then you are in the
building,
> but if you are in a ship floating in water, you are
not in
> the water; and so on and on and on. (016)
It's not stated in a list of that sort, but it's
implicit in (017)
the way those words are being used. (018)
Well, yes that is true, in a sense; but because its 'implicit'
doesn't mean it can be extracted by any algorithm. All claims like this
are just noise until someone actually finds a way to extract such
information. I know a lot of people are trying very hard, but nobody
(including you and Arun) have so far managed to actually do any of this.
I don't think most of the required content is even implicitly on the
Web. Find me, anywhere on the Web other than in Cyc, an account of the
different senses of 'cover' used in: (019)
cover with paint
cover with skin
cover with hair
cover with a sheet
cover with dust
under cover
taking cover (020)
Or talk to a linguist about the many senses of "in" used in
English (approximately 30, though it is hard to be exact), which require
an ontology to be used in order to disambiguate them. (021)
You can test that claim
by a crude, but simple use of Google. (022)
For example: (023)
"parties are events which people go to in order to
socialize and
have fun, and often though not always celebrate an
event such as
a birthday" (024)
Check pairs of words from that sentence on Google, look
at the count
of hits, and check a few of the pages in the top hits: (025)
party event -- 29,100,000 hits (026)
party people -- 39,500,000 (027)
party socialize -- 1,280,000 (028)
party fun -- 34,000,000 (029)
party celebrate -- 14,200,000 (030)
This tells one nothing more than that the words are associated.
That is not enough to state a coherent proposition, let alone a coherent
piece of ontological content. (031)
This is an enormous amount of data about how parties are
related (032)
to those other things. (033)
Really? How does one extract information about relationships
from free text or word associations? Associations, remember, are
symmetrical. (034)
Furthermore, much more detailed information
about how they're related can be found in the top few
pages. (035)
I certainly do not recommend that one should use the
Google method
of indexing to find the related information. What I
would recommend
is the method of indexing described in the following
paper by Arun
and me: (036)
http://www.jfsowa.com/pubs/analog.htm
Analogical Reasoning (037)
I don't believe that the information has to wait until
somebody
"tags" it with XML tags. Tags are fine for various
formal purposes.
But for commonsense info, I would rather start with the
raw data,
as it is stated by people talking or writing for other
people. (038)
As Ive already pointed out, most common sense is never said or
written to other people. And even for that which has been, I don't
accept that random free NL text is 'the raw data' for common sense. Even
if we could extract formalizable content from free text, the text does
not tell us how to represent it in an ontology (just think, for example,
of all the alternative ways of formalizing temporal relationships which
are expressed by English tenses.) But in any case, any claims like this
are at best a theoretical idea which has not yet been tested. Many very
smart people are trying very hard to extract useful information from the
unstructured Web, and they havn't managed to do very much of it yet. I
don't know of ANY methods or projects (including analogical structure
matching, by the way, which is being used actively by dozens of people
at NorthWestern, where it was invented) which can be said to reliably
extract a single nontrivial ontological proposition from the entire Web.
If anyone reading this knows of one, by all means tell us about it. Not
an idea or a belief, but a working system. (039)
Pat (040)
John (041)
_________________________________________________________________
Message Archives:
http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config:
http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (042)
-- (043)
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 cell
phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes (044)
********************************************************************
This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender.
You should not copy it or use it for any purpose nor disclose or
distribute its contents to any other person.
******************************************************************** (045)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (046)
|