John,
Your comments point out a few issues on which it seems we differ. To focus
on those: (01)
[PC]
>> Those meanings that can be reliably distinguished (>98%) by motivated
>> (rewarded for accuracy) human annotators.
>
> There are no such meanings -- except in very special cases. (02)
* I think that human performance with real informative text is typically
above that level, when one is trying to be accurate and not sloppy or hurried.
If it weren't, communication would be virtually impossible, and I would suggest
that people usually communicate quite well when they are trying to be clear.
The difficulty of **testing** that number is, that one has to start with some
inventory of senses, but the most detailed inventory yet used for such tests by
NL researchers is WordNet, which is not a good standard for such testing.
This is pretty much the point I was making, that meaningful sense
disambiguation needs a lot better inventory of senses than people are using.
Until we develop a logic-based word sense inventory intended for broad use I
don't see how the maximum agreement could be tested. (03)
[JFS]
> Unfortunately, there is no finite "set of senses" that can be used to
>achieve "human-level interpretation of a broad range of texts." (04)
* That is a bold claim, and even acknowledging the fact that it is difficult
to prove such a negative, my observations suggest that no remotely applicable
test has yet been conducted to see if such a claim is even plausible. One of
the points I made is that it would be very expensive process to develop such a
set of senses, and that process has never been funded. I suspect that Wordnet
(or its derivative used in Ontonotes) is used because the statistical NLP
programs that use Wordnet probably wouldn't perform much better even with a
perfect set of senses. So most NLP the effort is focused on other tasks. (05)
[JFS] > MT researchers have been working for over 60 years (since 1950) on the
task of designing an Interlingua that could be used for automated translation
from any NL to any other NL. All such attempts have failed.
And will continue to fail until a serious and adequately funded effort is
made to develop such an interlingua, in the form of a logic-based ontology that
is related via an NLU program to a meaningful text corpus. Efforts prior to
1990 did not have an adequate basis in logic, and efforts since then have been
too restricted to have any hope of achieving the goal. Even in the well-funded
CALO project the integration of ontology and NLU did not seem to make up a
major part. (06)
[JFS] > For the past half century, the most successful MT system has been
Systran, which is based on the Georgetown Automatic Translator (GAT), for which
research was terminated in 1963. It is based on hand-coded word and phrase
pairs for each of the language pairs it handles. (07)
* It is clear (from Google translate among other programs) that one can get a
somewhat useful translation solely from statistical analysis of parallel
corpora, but that tactic , however useful, is very far from actual
understanding of text at a human level - that is, a level sufficient for one
person to send a message in unstructured NL to a machine, and to expect that
the machine will make important or mission-critical decisions ( as well as a
person would) based on its interpretation without further human input. No
current statistical NL program comes remotely close enough. Statistics might
be pushed to that level, provided that the system can be trained on a properly
semantically annotated text - but that also would depend on an accurate
inventory of word senses. (08)
[JFS] . Fundamental principle: People think in *words*, not in *word senses*.
* Really? I sure don’t. Without the textual content to disambiguate
words, communication would be extremely error-prone. Where does that notion
come from? (09)
[JFS]
> Furthermore, the "senses" of similar words in different languages don't line
>up.
* In many cases no, and in other cases, yes. But when it is true that merely
indicates that different cultures may emphasize different aspects of entities
in the largely continuous world. All such meanings (specifying the differences
between closely related words) can still be specified with a reasonably small
inventory of semantic primitives. (010)
[JFS] > Please note the term 'microsense' by Allen Cruse.
* I am aware of that notion, but still find that virtually everything that I
have an interest in communicating or learning can be described by a discrete
set of necessary properties in an ontology (though rarely by both necessary and
sufficient conditions). The lack of "sufficient" properties along with the
necessary provides a lot of wiggle room so that various entities (those
"microsenses")may be shoehorned into the same category. **BUT** although the
speaker (or writer) may have something in mind more detailed than the generic
entity described by the necessary properties, all the listener can or will
understand is the necessary properties associated with a word sense, unless the
context makes clear that a more specific entity is intended. When people use a
word with many "microsenses", without disambiguating elaboration, then the
listener will typically understand only the generic meaning, and leave the
details unspecified, because, unless the speaker is intending to confuse, the
details will not be needed to understand the intended meaning, i.e. it doesn't
matter what subvariety of entity is involved. If someone said that a dog
bit her ear, I wouldn't assume what kind of dog it was, and would assume from
the lack of specificity that the "microsense" was irrelevant to the idea she
intended to convey - if it were relevant, she would specify. The notion of
"microsenses" as a theoretical concept is reasonable, but in cooperative
communication rarely is important. (011)
Pat (012)
Patrick Cassidy
MICRA Inc.
cassidy@xxxxxxxxx
1-908-561-3416 (013)
-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John F Sowa
Sent: Thursday, August 01, 2013 9:33 AM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] Context and Inter-annotator agreement (014)
Pat, (015)
JFS
>> But it is possible, for texts where clarity and precision are
>> critical, for the author to use tools that can help detect ambiguity,
>> avoid words that could be problematical, and suggest simpler syntax
>> for phrases that are overly complex. (016)
PC
> Yes, that was the intended implication. But what I had in mind goes
> beyond use of controlled natural language, useful as that is. (017)
The Boeing Language Checker was designed for producing controlled NLs.
But I cited it because the same techniques can be adapted to generate any
formal logic or KR language. (018)
All methods of knowledge acquisition depend on NL input. A controlled NL is
just a stage on a continuum between language and logic. (019)
JFS
>> what do you mean by "distinguishable" meanings (020)
PC
> Those meanings that can be reliably distinguished (>98%) by motivated
> (rewarded for accuracy) human annotators. (021)
There are no such meanings -- except in very special cases. The following note
by Adam K. is an example where the human annotators reached a level of 99.4% --
but the choice is unusually well defined. (022)
PC
> this [requires] a progressive iterative effort to develop (at least
> one) NLU program and a set of senses that it can understand so as to
> achieve human-level interpretation of a broad range of texts. (023)
Unfortunately, there is no finite "set of senses" that can be used to achieve
"human-level interpretation of a broad range of texts." (024)
MT researchers have been working for over 60 years (since 1950) on the task of
designing an Interlingua that could be used for automated translation from any
NL to any other NL. All such attempts have failed. (025)
For the past half century, the most successful MT system has been Systran,
which is based on the Georgetown Automatic Translator (GAT), for which research
was terminated in 1963. It is based on hand-coded word and phrase pairs for
each of the language pairs it handles. (026)
Over the years, the developers built up millions of such pairs, but no clearly
defined set of "senses". It achieved its success purely by brute force, and it
is still in use as Babelfish. (027)
Google uses a similar brute-force method with the word and phrase pairs chosen
by statistics. They don't have any set of "senses". (028)
Fundamental principle: People think in *words*, not in *word senses*. (029)
The senses found in dictionaries are based on the examples in whatever
citations were used by the lexicographers who wrote the definitions. (030)
There is a very "long tail" to that distribution: the more citations you
collect, the more senses you get. It doesn't converge because people are
constantly using words in new "senses". (031)
Furthermore, the "senses" of similar words in different languages don't line
up. That's why Systran and Google match phrases, not individual words. Even
then, there is a high error rate because the patterns are often split in
different parts of the sentence or in neighboring sentences. (032)
Please note the term 'microsense' by Allen Cruse. He learned from long, hard
experience that the senses very by small increments even in very similar
documents. See http://www.jfsowa.com/talks/goal.pdf (033)
John (034)
-------- Original Message --------
Subject: Re: [Corpora-List] WSD / # WordNet senses / Mechanical Turk
Date: Tue, 16 Jul 2013 14:40:50 +0100
From: Adam Kilgarriff <adam@xxxxxxxxxxxxxxxxxx>
To: Benjamin Van Durme <vandurme@xxxxxxxxxx>
CC: corpora@xxxxxx (035)
Re: the 0.994 accuracy result reported by Snow et al: there was precisely one
word used for this task, 'president', with the 3-way ambiguity between (036)
1) executive ofï¬cer of a ï¬rm, corporation, or university
2) head of a country (other than the U.S.)
3) head of the U.S., President of the United States (037)
Open a dictionary at random and you'll see that most polysemy isn't like that.
The result, based on one word, provides no insight into the difficulty of the
WSD task (038)
Adam (039)
On 16 July 2013 13:32, Benjamin Van Durme <vandurme@xxxxxxxxxx> wrote: (040)
Rion Snow, Brendan O'Connor, Daniel Jurafsky and Andrew Y. Ng. Cheap and Fast -
But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks.
EMNLP 2008.
http://ai.stanford.edu/~rion/papers/amt_emnlp08.pdf (041)
"We collect 10 annotations for each of 177 examples of the noun
'€œpresident' for the three senses given in SemEval. [...] performing
simple majority voting (with random tie-breaking) over annotators results in a
rapid accuracy plateau at a very high rate of
0.994 accuracy. In fact, further analysis reveals that there was only a single
disagreement between the averaged non-expert vote and the gold standard; on
inspection it was observed that the annotators voted strongly against the
original gold label (9-to-1 against), and that it was in fact found to be an
error in the original gold standard annotation. After correcting this error,
the non-expert accuracy rate is 100% on the 177 examples in this task. This is
a specific example where non-expert annotations can be used to correct expert
annotations." (042)
Xuchen Yao, Benjamin Van Durme and Chris Callison-Burch. Expectations
of Word Sense in Parallel Corpora. NAACL Short. 2012.
http://cs.jhu.edu/~vandurme/papers/YaoVanDurmeCallison-BurchNAACL12.pdf (043)
"2 Turker Reliability (044)
While Amazon’s Mechanical Turk (MTurk) has been been considered in
the
past for constructing lexical semantic resources (e.g., (Snow et al.,
2008; Akkaya et al., 2010; Parent and Eskenazi, 2010; Rumshisky,
2011)), word sense annotation is sensi- tive to subjectivity and
usually achieves low agree- ment rate even among experts. Thus we
first asked Turkers to re-annotate a sample of existing gold- standard
data. With an eye towards costs saving, we also considered how many
Turkers would be needed per item to produce results of sufficient
quality. (045)
Turkers were presented sentences from the test portion of the word
sense induction task of SemEval-2007 (Agirre and Soroa, 2007),
covering 2,559 instances of 35 nouns, expert-annotated with OntoNotes
(Hovy et al., 2006) senses. Â [...] (046)
We measure inter-coder agreement using Krip- pendorff’s Alpha
(Krippendorff, 2004; Artstein and Poesio, 2008), [...]" (047)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/ Community Wiki:
http://ontolog.cim3.net/wiki/ To join:
http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (048)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (049)
|