Sean, (01)
> I have only had time to follow up reading suggestions sent up to
> Tuesday, and will take a few days to read through the various
> suggests since then. (02)
I'll suggest a few more. (03)
> I had suspected that the answer would be semiotics, but the
> Wikipedia article reduced semiotics to the usual categories of
> syntax, semantics and pragmatics, which rather misses the point...
> Most useful was the pointer to the Stanford Encyclopaedia of
> Philosophy entry on Peirce's Theory of Signs. (04)
I agree that the Stanford article is better than the Wikipedia
article, but Peirce himself had subdivided the subject into
grammar, logic proper, and rhetoric, which Morris Cohen renamed
into syntax, semantics, and pragmatics. (05)
However, Peirce didn't *reduce* semiotics (or as he sometimes
spelled it semeiotic) to those three subjects as they are
commonly taught today. He had a much broader conception of
each of the three. (06)
In any case, it's always important to check anything from any
source (including and especially any encyclopedia). But the
Wikipedia also has a more detailed analysis of Peirce's
classification of signs in the following article (which should
also be checked against Peirce's own words before accepting): (07)
http://en.wikipedia.org/wiki/Semiotic_elements_and_classes_of_signs_(Peirce) (08)
> What was more surprising is that nobody mentioned Natural
> Language Processing, or systems such as CYC, where I had
> thought knowledge is used to disambiguate sentences. (09)
Unfortunately, much of the energy of NLP from the 1980s has
been dissipated in statistical methods. Those methods have
proved to be very useful for many purposes. But statistics,
by itself, can never produce semantics. (010)
The so-called "latent semantics" may be useful for information
retrieval, but it is not real semantics. It cannot explain
what a sentence (or a document) means or do further reasoning
about what it finds. (011)
As a survey of the state of the art of NLP for information
extraction, I recommend the following article: (012)
http://www.cs.utah.edu/~riloff/pdfs/NLPHandbook-IE.pdf
Information Extraction, by Jerry Hobbs and Ellen Riloff (013)
This is Chapter 21 in _Handbook on Natural Language Processing_
published in 2010, so it is reasonably up to date, and Jerry
Hobbs has been active in NLP since the 1970s. But the systems
that they reviewed and compared in that article all use templates
and statistical methods for IE. (014)
Their concluding paragraph notes that those methods have reached
a barrier of about 60% accuracy (as measured by the geometric
mean of recall and precision): (015)
> Good named entity recognition systems typically recognize about
> 90% of the entities of interest in a text, and this is near human
> performance. To recognize an event and its arguments requires
> recognizing about four entities, and 0.9^4 is about 60%. If this
> is the reason for the 60% barrier, it is not clear what we can
> do to overcome it, short of solving the general natural language
> problem in a way that exploits the implicit relations among the
> elements of a text. (016)
In other words, they will have to go back to the more traditional
symbolic methods of knowledge representation -- but they'll have
to find more successful ways of dealing with performance issues. (017)
In 1999, I presented the following article in a Summer School
on Information Extraction: (018)
http://www.jfsowa.com/pubs/template.htm
Relating Templates to Language and Logic (019)
This is essentially the method that we have been implementing
at VivoMind: use conceptual graphs instead of templates and
rely on graph matching for IE. But in 1999, we did not have
a way to do the graph matching with sufficient speed to be
competitive. (020)
Since then, the high-speed analogy engine that Arun Majumdar
designed reverses the performance of symbolic methods compared
to statistical methods. Instead of analyzing large volumes
of text and summarizing the results in statistics, we translate
the texts to conceptual graphs, encode those graphs in a compact
form, and index them. When we want to find matching graphs,
we can find them in logarithmic time. That's fast enough. (021)
Following is a recent talk, which I presented at MJCAI: (022)
http://wwww.jfsowa.com/talks/futures.pdf
Future directions for semantic systems (023)
We have not tested our system on the MUC documents, but we were
involved in a comparison with a dozen other systems on documents
by the US Dept. of Energy. It involved reading documents,
extracting certain information from them, and displaying the
information in tables. The score was the number of correct
entries in the tables. (024)
All but two of the systems failed to exceed the 60% barrier.
One got 73% correct, and we got 96% correct. The methods we
used are summarized in the futures.pdf slides. The following
slides (and the readings on the final slide) present a bit more: (025)
http://www.jfsowa.com/talks/pursue.pdf (026)
Statistical methods are useful as a supplement for many purposes.
But you can't do semantics without using symbolic methods. (027)
John (028)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (029)
|