[Top] [All Lists]

[ontolog-forum] Using Wikipedia as a Folksonomy

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Tue, 13 Mar 2007 16:02:56 -0500
Message-id: <45F71180.7000802@xxxxxxxxxxx>
The Wikipedia is currently the largest informally defined result
of collaborative tagging.  Many people have criticized it for its
lack of supervision and uneven quality of many of the articles.
Yet it does serve as a convenient body of texts that have been
classified informally -- over 400 million words grouped in
over one million articles.  The title of each article is a
tag that classifies the article.    (01)

Following is an article about using Wikipedia as a resource of
tagged articles.   It contains over 400 million words grouped
in over one million articles.  The title of each article is
a tag that classifies the article.    (02)

Computing Semantic Relatedness using Wikipedia-based Semantic Analysis    (03)

This illustrates the kind of work that can be done with
such resources.    (04)

John Sowa    (05)

----------------------------------------------------------------    (06)

Computing Semantic Relatedness using
Wikipedia-based Explicit Semantic Analysis    (07)

Evgeniy Gabrilovich and Shaul Markovitch    (08)

Department of Computer Science
Technion—Israel Institute of Technology, 32000 Haifa, Israel    (09)

Abstract    (010)

Computing semantic relatedness of natural language
texts requires access to vast amounts of
common-sense and domain-specific world knowledge.
We propose Explicit Semantic Analysis (ESA),
a novel method that represents the meaning
of texts in a high-dimensional space of concepts
derived from Wikipedia. We use machine learning
techniques to explicitly represent the meaning of
any text as a weighted vector of Wikipedia-based
concepts. Assessing the relatedness of texts in
this space amounts to comparing the corresponding
vectors using conventional metrics (e.g., cosine).
Compared with the previous state of the art, using
ESA results in substantial improvements in correlation
of computed relatedness scores with human
judgments: from r = 0:56 to 0:75 for individual
words and from r = 0:60 to 0:72 for texts. Importantly,
due to the use of natural concepts, the ESA
model is easy to explain to human users.    (011)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (012)

<Prev in Thread] Current Thread [Next in Thread>