[Top] [All Lists]

Re: [ontolog-forum] Axiomatic ontology

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Patrick Cassidy" <pat@xxxxxxxxx>
Date: Fri, 1 Feb 2008 10:49:35 -0500
Message-id: <04ac01c864ea$0b682a80$22387f80$@com>

  I have to agree with PatH on this issue – which I had supposed was fairly obvious to anyone who is trying to extract information from text.

  There may actually be, somewhere on the Web, similar content to what is expressed in painstaking accurate detail in Cyc, but as PatH emphasizes – how will one recognized it and how will one use it??  And how will one distinguish fact from nonsense?

   Saying that there is a lot of information to be had on the Web is like saying that there are tons of gold in the ocean, and all one has to do is go out there in your boat and get it.


   The reason Google is useful is that some smart people have figured out a way for other smart people to do a search that eliminates most (but by no means all) of the noise in returning documents.  But that only works automatically if you have a machine that can understand language as well as a college graduate.  If you find one, do let us know about it.

  Build an ontology from the Web?  Well, sure, if you have a machine that understands language as well as a Ph.D. and has the equivalent expertise of an experienced ontologist.  Duh.

   If anyone has actually seen an ontology of any kind extracted from the Web that is actually useful for something, please let us know.  Saying that it is in theory possible tells us nothing.




Patrick Cassidy



cell: 908-565-4053



From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Pat Hayes
Sent: Friday, February 01, 2008 2:06 AM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Axiomatic ontology


At 12:51 AM -0500 2/1/08, John F. Sowa wrote:


The amount of "common sense" info on the WWW dwarfs Cyc by many

orders of magnitude.


Nonsense. But go ahead, prove me wrong: find some. I'd be delighted if you could.


PH> Remember, common sense isn't encyclopedia knowledge: its
 > the 'obvious' stuff that any child knows about the way the
 > world works, such as: things fall if they aren't supported;
 > roofs protect you from rain unless it is very windy; if you
 > fall into a liquid you will get wet; parties are events which
 > people go to in order to socialize and have fun, and often
 > though not always celebrate an event such as a birthday;
 > objects are made of kinds of stuff; Jello is a kind of stuff;
 > hitting something fragile is likely to break it if you hit it
 > hard enough; being in a cold space makes you feel cold; you
 > can walk on sand but it can also be poured, if it is dry;
 > being covered with paint is similar in some ways to being
 > covered with hair, but not in all ways; people like eating
 > sugar; animals die if they stop eating or stop drinking; if
 > you are in a room in a building then you are in the building,
 > but if you are in a ship floating in water, you are not in
 > the water; and so on and on and on.

It's not stated in a list of that sort, but it's implicit in

the way those words are being used.


Well, yes that is true, in a sense; but because its 'implicit' doesn't mean it can be extracted by any algorithm. All claims like this are just noise until someone actually finds a way to extract such information. I know a lot of people are trying very hard, but nobody (including you and Arun) have so far managed to actually do any of this. I don't think most of the required content is even implicitly on the Web. Find me, anywhere on the Web other than in Cyc, an account of the different senses of 'cover' used in:


cover with paint

cover with skin

cover with hair

cover with a sheet

cover with dust

under cover

taking cover


Or talk to a linguist about the many senses of "in" used in English (approximately 30, though it is hard to be exact), which require an ontology to be used in order to disambiguate them.


 You can test that claim
by a crude, but simple use of Google.

For example:

    "parties are events which people go to in order to socialize and
    have fun, and often though not always celebrate an event such as
    a birthday"

Check pairs of words from that sentence on Google, look at the count
of hits, and check a few of the pages in the top hits:

    party event -- 29,100,000 hits

    party people -- 39,500,000

    party socialize -- 1,280,000

    party fun -- 34,000,000

    party celebrate -- 14,200,000


This tells one nothing more than that the words are associated. That is not enough to state a coherent proposition, let alone a coherent piece of ontological content.


This is an enormous amount of data about how parties are related

to those other things.


Really? How does one extract information about relationships from free text or word associations? Associations, remember, are symmetrical.


  Furthermore, much more detailed information
about how they're related can be found in the top few pages.

I certainly do not recommend that one should use the Google method
of indexing to find the related information.  What I would recommend
is the method of indexing described in the following paper by Arun
and me:

    Analogical Reasoning

I don't believe that the information has to wait until somebody
"tags" it with XML tags.  Tags are fine for various formal purposes.
But for commonsense info, I would rather start with the raw data,
as it is stated by people talking or writing for other people.


As Ive already pointed out, most common sense is never said or written to other people. And even for that which has been, I don't accept that random free NL text is 'the raw data' for common sense. Even if we could extract formalizable content from free text, the text does not tell us how to represent it in an ontology (just think, for example, of all the alternative ways of formalizing temporal relationships which are expressed by English tenses.) But in any case, any claims like this are at best a theoretical idea which has not yet been tested. Many very smart people are trying very hard to extract useful information from the unstructured Web, and they havn't managed to do very much of it yet. I don't know of ANY methods or projects (including analogical structure matching, by the way, which is being used actively by dozens of people at NorthWestern, where it was invented) which can be said to reliably extract a single nontrivial ontological proposition from the entire Web. If anyone reading this knows of one, by all means tell us about it. Not an idea or a belief, but a working system.





Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/ 
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/ 
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx




IHMC               (850)434 8903 or (650)494 3973   home
40 South Alcaniz St.       (850)202 4416   office
Pensacola                 (850)202 4440   fax
FL 32502                     (850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>