[Top] [All Lists]

Re: [ontolog-forum] Amazon vs. IBM: Big Blue meets match in battle for t

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Wed, 31 Jul 2013 02:15:56 -0400
Message-id: <51F8AB9C.60304@xxxxxxxxxxx>
Kingsley and Pat C,    (01)

> When all is said and done, you end up with the best of both worlds,
> since there's basically a slot for everything that constitutes the
> description graph to which an entity IRI (e.g., HTTP URI) resolves.    (02)

I'm happy with the idea of IRIs when used as a generalization of URLs:
unique identifiers when you need unique identifiers.    (03)

But the problem with NLP is that you never know what a word in a text
really means until *after* you understand the text.    (04)

Any attempt to replace words with IRIs as a preliminary step toward
understanding a text is not only futile, it's hopelessly wrong headed.    (05)

>> Studies of inter-annotator agreement among well-trained humans show
>> that 95% agreement is very rarely achieved.  More typically, the best
>> computer systems achieve about 75% accuracy.    (06)

> In what - er - "context" is this true?  Do you have a pointer to this?
> This kind of number must be task-dependent.    (07)

The percentage of 95% is widely cited as a "gold standard" and the
value of 75% comes from the SENSEVAL (sense evaluation) projects.    (08)

There was a recent discussion of this point on Corpora List,
which is devoted to NLP work on large volumes of NL data.    (09)

I made some comments that 95% would imply 15 errors per page and
that my high-school English teacher would not consider that good.    (010)

Adam Kilgarriff, who had originally published the value 95%,
admitted that the actual values for inter-annotator agreement
are actually less than that.  Following is his response.    (011)

The entire thread is archived.  You can excerpt a phrase
from the following note, put it in quotes, and find the
complete thread with your favorite search engine.    (012)

-------- Original Message --------
Subject: Re: [Corpora-List] WSD / # WordNet senses / Mechanical Turk
Date:   Tue, 16 Jul 2013 07:43:58 +0100
From:   Adam Kilgarriff <adam@xxxxxxxxxxxxxxxxxx>
To:     John F Sowa <sowa@xxxxxxxxxxx>
CC:     corpora@xxxxxx    (013)

Dear Mark, John,    (014)

Let me confess to a moment of embarrassment that I've been anxious
about for years: following SENSEVAL-1 I did a (tiny) experiment to
establish inter-annotator agreement, and came up with the 95% figure
cited by John.    (015)

On experience since, I think the findings were not sound, and it is
most unusual to get a figure that high, and I regret having published
it  (and, worse, having put it in the title of a short paper from EACL-99)    (016)

For either automatic WSD, or even for the gold standard, I agree
entirely with John:    (017)

     Miss Elliott, my high-school English teacher, wouldn't give
     anyone a gold star [for work like that]    (018)

Adam    (019)

Adam Kilgarriff <http://www.kilgarriff.co.uk/> adam@xxxxxxxxxxxxxxxxxx
Director Lexical Computing Ltd <http://www.sketchengine.co.uk/>
Visiting Research Fellow University of Leeds <http://leeds.ac.uk>
/Corpora for all/ with the Sketch Engine <http://www.sketchengine.co.uk>
/DANTE: a lexical database for English <http://www.webdante.com>/
========================================    (020)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (021)

<Prev in Thread] Current Thread [Next in Thread>