Pat and Ed, (01)
Nobody "extracts" common sense. They just use it. (02)
PH> ... but because [knowledge in text is] 'implicit'
> doesn't mean it can be extracted by any algorithm. (03)
As an example of an alternative approach to natural language,
note that machine translation systems that attempt to map the
source language to a "logical form" and then to a target NL have
failed. The most widely used MT system is SYSTRAN, which began
as the Georgetown Automatic Translator (GAT), for which research
was *terminated* in 1963. The GAT-SYSTRAN method is based on long
lists of word and phrase pairs from the source to the target. (04)
Many recent MT systems apply statistical techniques to bilingual
corpora to find patterns in the source language that match patterns
in the target. Although statistical methods are very different from
GAT, they both succeed for similar reasons: the knowledge needed
for language understanding is encoded in the surface patterns, and
much can be done without an intermediate form that resembles logic. (05)
EB> ... the very good and effective work on knowledge acquisition
> from unstructured text that is "guided" by a reference ontology.
> That approach provides the search engine with a "starter ontology"
> that defines the principal concepts and relationships in the domain
> and then extends that knowledge base (by Bayesian analysis) by
> extracting and interpreting natural language from a broader corpus. (06)
I agree that some kind of ontology is important, and the methods I
have been proposing could benefit from whatever ontology is provided.
Some serious research issues: (07)
1. How big, how detailed, and how formal should that starter be? (08)
2. What kind of internal representation would be useful? (09)
3. Is a closed-form "definition" of the symbols necessary? (010)
4. What methods of learning, statistics, analogy, pattern matching,
*and* logic are appropriate? (011)
5. What intermediate goals would promote progress, and what goals
would be more distracting or misleading than helpful? (012)
> Find me, anywhere on the Web other than in Cyc, an account
> of the different senses of 'cover' used in... (013)
> Or talk to a linguist about the many senses of "in" used in
> English (approximately 30, though it is hard to be exact),
> which require an ontology to be used in order to disambiguate them. (014)
What I am questioning is the need for an a priori list of all
possible interpretations. That is not the assumption that
underlies GAT or the statistical methods of MT. (015)
Lexicographers dutifully prepare such lists in unabridged
dictionaries such as the OED or the MW 3rd. Those lists are
useful for some purposes, but 99.44% of the world's population
get along quite well without them. People who use dictionaries
to learn a foreign language would probably be happy with word
and phrase pairs derived automatically by statistical methods. (016)
> This [data from Google] tells one nothing more than that the
> words are associated. That is not enough to state a coherent
> proposition, let alone a coherent piece of ontological content. (017)
What the data from Google shows is that the information is there.
A student who was learning English could read that information
to learn enough about the use of those words to translate any
of that information to another language, natural or artificial. (018)
> How does one extract information about relationships from free
> text or word associations? Associations, remember, are symmetrical. (019)
"Extracting" is not the goal. The goal is to read a text in a
natural language in order to solve some problem that one would
normally ask an intelligent human being to do. Many of those
problems could be solved by variations of pattern matching
as described above. (020)
> As Ive already pointed out, most common sense is never said or
> written to other people. (021)
Precisely! And there is no reason why it should be said, written,
or typed into a computer system that processes language. (022)
> I don't know of ANY methods or projects (including analogical
> structure matching, by the way, which is being used actively by
> dozens of people at NorthWestern, where it was invented) which
> can be said to reliably extract a single nontrivial ontological
> proposition from the entire Web. (023)
Their algorithms take polynomial time. It is impossible to process
the WWW (or even the data on your laptop) with algorithms that
take polynomial time. (024)
I'm not claiming that analogical methods applied to the WWW will
solve all the problems of NL understanding tomorrow. But I do
claim that we will *never* do that if we must first construct
a knowledge base or ontology that remotely resembles Cyc. (025)
John (026)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (027)
|