Re: [ontolog-forum] Interesting way of using Google

To:	"[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From:	Jenny ure <jure2@xxxxxxxxxxxx>
Date:	Mon, 27 Aug 2007 18:05:18 +0100
Message-id:	<46D3044E.5000507@xxxxxxxxxxxx>

Is interesting - similar to the use of 'concordancing' in language teaching - using mining of examples of words in context from multiple texts as a means of conveying current meaning in context.

Caveats might be that
-the same extraction process will provide different results over time - ie a snapshot of an evolving process
-averaging out is only valid within the same population - across different populations it obscures real differences and contrary to inituitive assumptions , epidemiologists will tell you that this can introduce a misleading bias.

Joseph Goguen had some nice caveats for seekers after 'a single unified ontology that attracts consensus because it "reflects the real underlying reality" , - which some might see this as providing.

He argued instead that if the need for conceptual diversity is accepted, it then follows that 'knowledge engineering should seek ways to support it, rather than ways to overcome, suppress, or subvert it' by providing ‘support for multiple evolving ontologies for single domains , accepting that translations among such theories will necessarily be partial and incomplete and providing tools to help construct such partial mappings'.

Jenny Ure

John F. Sowa wrote:

Since this forum has been unusually quiet for the past week,
I thought I'd send a note to keep it from gathering dust.

Following is the abstract, URL, and an excerpt from an article
that presents an interesting method for deriving semantic
information from Google page counts.

John
_________________________________________________________

Source: http://homepages.cwi.nl/~paulv/papers/amdug.pdf

Automatic Meaning Discovery Using Google

Rudi Cilibrasi, CWI

Paul Vitanyi, CWI, University of Amsterdam,
National ICT of Australia

Abstract

We have found a method to automatically extract the meaning of words
and phrases from the world-wide-web using Google page counts. The
approach is novel in its unrestricted problem domain, simplicity
of implementation, and manifestly ontological underpinnings. The
world-wide-web is the largest database on earth, and the latent
semantic context information entered by millions of independent
users averages out to provide automatic meaning of useful quality.
We demonstrate positive correlations, evidencing an underlying
semantic structure, in both numerical symbol notations and
number-name words in a variety of natural languages and contexts.
Next, we demonstrate the ability to distinguish between colors
and numbers, and to distinguish between 17th century Dutch painters;
the ability to understand electrical terms, religious terms, and
emergency incidents; we conduct a massive experiment in understanding
WordNet categories; and finally we demonstrate the ability to do a
simple automatic English-Spanish translation.

[Excerpt from Section 1 of the above paper]

At the time of writing, Google searches 8,058,044,651 web pages.
Define the joint event xTy = {w : x,y 2 w} as the set of web pages
returned by Google, containing both the search term x and the search
term y.  The joint probability p(x,y) = |{w : x,y 2 w}|/M is the
number of web pages in the joint event divided by the overall number
M of web pages possibly returned by Google.  This notation also
allows us to define the probability p(x|y) of conditional events
x|y = (xTy)/y defined by p(x|y) = p(x,y)/p(y).

In the above example we have therefore p(horse) ~ 0.0058,
p(rider) ~ 0.0015, p(horse, rider) ~ 0.0003.  We conclude that
the probability p(horse|rider) of “horse” accompanying “rider” is
~ 1/5 and the probability p(rider|horse) of “rider” accompanying
“horse” is ~ 1/19. The probabilities are asymmetric, and it is the
lesser probability that is the significant one.



 
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread]	Current Thread	[Next in Thread>
[ontolog-forum] Interesting way of using Google, John F. Sowa Re: [ontolog-forum] Interesting way of using Google, Jenny ure <= Re: [ontolog-forum] Interesting way of using Google, John F. Sowa Re: [ontolog-forum] Interesting way of using Google, paola . dimaio [ontolog-forum] What's happened to TAMBIS, BACIIS and similar information integration systems?, Claudio Cardone Re: [ontolog-forum] What's happened to TAMBIS, BACIIS and similar information integration systems?, Wacek Kusnierczyk Re: [ontolog-forum] What's happened to TAMBIS, BACIIS and similar information integration systems?, Claudio Cardone

Previous by Date:	[ontolog-forum] FW: interoperability: 3 Conferences, Duane Nickull
Next by Date:	Re: [ontolog-forum] FW: interoperability: 3 Conferences, paola . dimaio
Previous by Thread:	[ontolog-forum] Interesting way of using Google, John F. Sowa
Next by Thread:	Re: [ontolog-forum] Interesting way of using Google, John F. Sowa
Indexes:	[Date] [Thread] [Top] [All Lists]