Is interesting - similar to the use of 'concordancing' in language
teaching - using mining of examples of words in context from multiple
texts as a means of conveying current meaning in context.
Caveats might be that
-the same extraction process will provide different results over time -
ie a snapshot of an evolving process
-averaging out is only valid within the same population - across
different populations it obscures real differences and contrary to
inituitive assumptions , epidemiologists will tell you that this can
introduce a misleading bias.
Joseph Goguen had some nice caveats for seekers after 'a
single unified ontology that attracts consensus
because it "reflects the real underlying reality" , - which some might
see this as providing.
He argued instead that if the need for conceptual diversity is
accepted, it then
follows that 'knowledge engineering should seek ways to support it,
rather than
ways to overcome, suppress, or subvert it' by providing ‘support for multiple evolving ontologies for single domains , accepting that translations among such theories will necessarily be
partial and incomplete
and providing tools to help construct such
partial mappings'.
Jenny Ure
John F. Sowa wrote:
Since this forum has been unusually quiet for the past week,
I thought I'd send a note to keep it from gathering dust.
Following is the abstract, URL, and an excerpt from an article
that presents an interesting method for deriving semantic
information from Google page counts.
John
_________________________________________________________
Source: http://homepages.cwi.nl/~paulv/papers/amdug.pdf
Automatic Meaning Discovery Using Google
Rudi Cilibrasi, CWI
Paul Vitanyi, CWI, University of Amsterdam,
National ICT of Australia
Abstract
We have found a method to automatically extract the meaning of words
and phrases from the world-wide-web using Google page counts. The
approach is novel in its unrestricted problem domain, simplicity
of implementation, and manifestly ontological underpinnings. The
world-wide-web is the largest database on earth, and the latent
semantic context information entered by millions of independent
users averages out to provide automatic meaning of useful quality.
We demonstrate positive correlations, evidencing an underlying
semantic structure, in both numerical symbol notations and
number-name words in a variety of natural languages and contexts.
Next, we demonstrate the ability to distinguish between colors
and numbers, and to distinguish between 17th century Dutch painters;
the ability to understand electrical terms, religious terms, and
emergency incidents; we conduct a massive experiment in understanding
WordNet categories; and finally we demonstrate the ability to do a
simple automatic English-Spanish translation.
[Excerpt from Section 1 of the above paper]
At the time of writing, Google searches 8,058,044,651 web pages.
Define the joint event xTy = {w : x,y 2 w} as the set of web pages
returned by Google, containing both the search term x and the search
term y. The joint probability p(x,y) = |{w : x,y 2 w}|/M is the
number of web pages in the joint event divided by the overall number
M of web pages possibly returned by Google. This notation also
allows us to define the probability p(x|y) of conditional events
x|y = (xTy)/y defined by p(x|y) = p(x,y)/p(y).
In the above example we have therefore p(horse) ~ 0.0058,
p(rider) ~ 0.0015, p(horse, rider) ~ 0.0003. We conclude that
the probability p(horse|rider) of “horse” accompanying “rider” is
~ 1/5 and the probability p(rider|horse) of “rider” accompanying
“horse” is ~ 1/19. The probabilities are asymmetric, and it is the
lesser probability that is the significant one.
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
|
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (01)
|