On Apr 3, 2008, at 9:52 AM, John F. Sowa wrote:
Getting all the way to learning from reading is a big step.
So, any ideas about a panel?
An important intermediate step is to tag text, such as
web pages, with the kinds of semantic tags that have been
envisioned for the Semantic Web and other efforts.
We are actually doing automatic
production of text meaning representations
on the basis of an incomplete ontology/lexicon/
fact repository knowledge resources, on sentences
(from the web) that contain a minimum of unknown
lexical units (precision over recall), with the idea
of learning lexicon entries (at least syntax, semantics
and linking info) and ontology concepts standing
in one-to-one correspondence to them.
So, the overall idea is to test whether the handcrafted
lexicon/ontology complex plus the semantic analyzer
that we have (and CYC, to my knowledge, does not really)
can serve as a bootstrapping device for the spiral methodology
of learning (that is, next time around analysis of text
will use a broader-coverage lexicon/ontology/FR, leading
to - one hopes - better results and a better quality of
entities learned, etc. ad infinitum).
There are tons of issues involved, of course.
Human-generated tags tend to be (a) extremely expensive
to insert, (b) inconsistent from one tagger to another,
and (c) unlikely to be written by the original authors.
Right. That's why SW, if it happens, will happen for a reason
other than volunteer mark-up.
The kind of learning/acquisition needed for tag generation
is less than what's needed for full understanding. But if
one tool can do part of the work by inserting tags, other
tools and systems would have a leg up in doing more.
First, our TMRs are much more involved than typical SW tags (though
they can be represented as such - we have some publications about
this). Second, the currently produced TMRs will certainly require further
processing to ensure learning. Extensions include reference (not just
CO-reference) resolution, indirect speech acts, unexpected ("ill-formed")
input, the ubiquitous non-literal language etc. We are working on
the first two and have worked a bit on the latter two. But SO much work
is still needed.
One more issue is feasibility. While we don't want human annotation/
analyses we think that there may be a good way of using people to
validate the results of learning, so as to improve its quality faster
(and therefore cheaper). We have implemented an environment
for this type of work. It's not optimal, certainly...