Radinsky et al also had a very interesting JAIR paper reported in early Jan 2013 (published 12/12):
K. Radinsky, S. Davidovich and S. Markovitch (2012) "Learning to Predict from Textual Data", Volume 45, pages 641-684.
http://www.jair.org/papers/paper3865.html
Abstract
Given a current news event, we tackle the problem of generating plausible predictions
of future events it might cause. We present a new methodology for modeling and predicting
such future news events using machine learning and data mining techniques. Our Pundit
algorithm generalizes examples of causality pairs to infer a causality predictor. To obtain
precisely labeled causality examples, we mine 150 years of news articles and apply semantic
natural language modeling techniques to headlines containing certain prede_ned causality
patterns. For generalization, the model uses a vast number of world knowledge ontologies.
Empirical evaluation on real news articles shows that our Pundit algorithm performs as
well as non-expert humans.
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of Ali SH
Sent: Tuesday, February 05, 2013 2:35 PM
To: [ontolog-forum]
Subject: [ontolog-forum] Combining Machine Learning, Ontologies, Semantic Networks and DBPedia
This is a cool little project:
The system provides striking results when tested on historical data. For example, reports of droughts in Angola in 2006 triggered a warning about possible cholera outbreaks in the country, because previous events had taught the system that
cholera outbreaks were more likely in years following droughts. A second warning about cholera in Angola was triggered by news reports of large storms in Africa in early 2007; less than a week later, reports appeared that cholera had become established. In
similar tests involving forecasts of disease, violence, and a significant numbers of deaths, the system’s warnings were correct between 70 to 90 percent of the time.
The system was built using 22 years of New York Times archives, from 1986 to 2007, but it also draws on data from the Web to learn about what leads up to major news events.
“One source we found useful was DBpedia, which is a structured form of the information inside Wikipedia constructed using crowdsourcing,” says Radinsky. “We can understand, or see, the location of the places in the news articles, how much
money people earn there, and even information about politics.” Other sources included WordNet, which helps software understand the meaning of words, and OpenCyc, a database of common knowledge.