[Top] [All Lists]

[ontolog-forum] Interesting way of using Google

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Sun, 26 Aug 2007 18:39:31 -0400
Message-id: <46D20123.3080202@xxxxxxxxxxx>
Since this forum has been unusually quiet for the past week,
I thought I'd send a note to keep it from gathering dust.    (01)

Following is the abstract, URL, and an excerpt from an article
that presents an interesting method for deriving semantic
information from Google page counts.    (02)

_________________________________________________________    (03)

Source: http://homepages.cwi.nl/~paulv/papers/amdug.pdf    (04)

Automatic Meaning Discovery Using Google    (05)

Rudi Cilibrasi, CWI    (06)

Paul Vitanyi, CWI, University of Amsterdam,
National ICT of Australia    (07)

Abstract    (08)

We have found a method to automatically extract the meaning of words
and phrases from the world-wide-web using Google page counts. The
approach is novel in its unrestricted problem domain, simplicity
of implementation, and manifestly ontological underpinnings. The
world-wide-web is the largest database on earth, and the latent
semantic context information entered by millions of independent
users averages out to provide automatic meaning of useful quality.
We demonstrate positive correlations, evidencing an underlying
semantic structure, in both numerical symbol notations and
number-name words in a variety of natural languages and contexts.
Next, we demonstrate the ability to distinguish between colors
and numbers, and to distinguish between 17th century Dutch painters;
the ability to understand electrical terms, religious terms, and
emergency incidents; we conduct a massive experiment in understanding
WordNet categories; and finally we demonstrate the ability to do a
simple automatic English-Spanish translation.    (09)

[Excerpt from Section 1 of the above paper]    (010)

At the time of writing, Google searches 8,058,044,651 web pages.
Define the joint event xTy = {w : x,y 2 w} as the set of web pages
returned by Google, containing both the search term x and the search
term y.  The joint probability p(x,y) = |{w : x,y 2 w}|/M is the
number of web pages in the joint event divided by the overall number
M of web pages possibly returned by Google.  This notation also
allows us to define the probability p(x|y) of conditional events
x|y = (xTy)/y defined by p(x|y) = p(x,y)/p(y).    (011)

In the above example we have therefore p(horse) ~ 0.0058,
p(rider) ~ 0.0015, p(horse, rider) ~ 0.0003.  We conclude that
the probability p(horse|rider) of “horse” accompanying “rider” is
~ 1/5 and the probability p(rider|horse) of “rider” accompanying
“horse” is ~ 1/19. The probabilities are asymmetric, and it is the
lesser probability that is the significant one.    (012)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (013)

<Prev in Thread] Current Thread [Next in Thread>