There was a discussion on Corpora List about the differences
between WordNet and Ontology. The discussions were similar
to the ones that have occurred many times in Ontolog Forum. (01)
When that thread died down, I received an offline note from
somebody who has been working for years in computational
linguistics. Following is a slightly edited version of
some excerpts from his note and my responses. (02)
For more background on some of the issues, see (03)
http://www.jfsowa.com/talks/goal3.pdf (04)
John (05)
-------- Original Message -------- (06)
> While I can agree that language continually introduces new
> meanings, and that existing dictionaries can't anticipate all
> meanings that will occur in text, the bulk of the meanings that
> appear in text can be recognized by human readers and assigned
> to the senses appearing in a good dictionary. (07)
To a certain extent, yes. But as you know, inter-annotator
agreement, even among experts, is not very high. Among people
who don't have some training in linguistics or lexicography,
it's rather low. (08)
When I wrote my first book (1984), I was hopeful that word senses
had a good mapping to concept types. But I was very sympathetic
to Wittgenstein's language games. And Ch 7 of that book (Limits
of Conceptualization) talked about the many issues and exceptions. (09)
Over the years, my opinions about word senses have come very close
to "I don't believe in word senses" by Sue Atkins and Adam Kilgarriff: (010)
http://www.kilgarriff.co.uk/Publications/1997-K-CHum-believe.pdf (011)
> The strategies to date that we have been following is to employ
> a limited number of skilled human beings to create for the computer
> the data it will need in order to recognize the senses of words
> in text. (012)
That could be useful. At our VivoMind company, we take advantage
of multiple resources, but we don't accept any of them as infallible.
See http://www.jfsowa.com/pubs/paradigm.pdf . (013)
We have also been getting good results by mapping everything
to conceptual graphs, even for languages for which our lexical
resources are rudimentary. Then by analyzing the patterns in
the CGs, we can improve the rudimentary ontology and get
better results by reanalyzing the same documents. You can
iterate a couple of times with some human corrections of
the more egregious errors. The results are fairly good. (014)
> Only one new methodology in recent years has succeeded in
> creating very large collections of knowledge about the world
> without occurring the enormous expense of employing a large
> enough staff, namely Wikipedia. (015)
I agree that Wikipedia is a great resource for many purposes.
But our small company has been getting contracts for specialized
domains for which Wikipedia's generic data is insufficient.
For examples, see http://www.jfsowa.com/talks/goal7.pdf . (016)
And note the many microsenses of the word 'ontology' in this
thread. The same issues are true of any topic when you get
into the details. (017)
One of the most notorious words is 'God'. Just ask anybody
"Do you believe in God?" Independently of their answer, ask
"How would you describe God?" It's unlikely that you'll get
any two answers that are remotely similar. (018)
Parsing sentences that contain the word 'God' is easy. But
interpreting them and reasoning about the results is not. (019)
> The difficulty of the Wiki- approach is that we have not yet
> devised a representation of the information we want to collect
> that can be communicated intuitively to the general public such
> that anyone with an understanding of text can create entries
> that a computer could use to understand other texts. (020)
The only general representation that we have found useful for
communicating with anybody -- novices or experts -- is their
native language. Controlled English is good -- if they've
had some training. But it's easier to parse their unrestricted
English than to interpret what they thought they were saying
in any artificial notation they don't feel comfortable with. (021)
> Even creating the means for people to enter structured subsets
> of that knowledge would be an advance over using natural
> language as the only means of representation. (022)
The structure of any structured notation will depend very strongly
on the structure of the subject matter. And many specialized
graphical notations have been developed already. (023)
Note the VivoMind solution for the legacy reengineering problem
(goal7.pdf): Translate whatever notation people already know
(e.g., COBOL) to conceptual graphs. For output, translate CGs to
whatever structured notation the customer asks for (e.g., UML). (024)
Guideline for interpreting what anybody says in any artificial
notation you choose: "You may think you understand what I said,
but you don't realize that what I said is not what I meant." (025)
John (026)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (027)
|