ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Context and Inter-annotator agreement

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Patrick Cassidy" <pat@xxxxxxxxx>
Date: Thu, 1 Aug 2013 02:26:40 -0400
Message-id: <133401ce8e80$15ab0290$410107b0$@micra.com>
John and David,
   Some follow-up:    (01)

[JFS]  > There is no way that an author or speaker could write or speak 
fluently about any subject while constantly thinking about which word sense to 
select for every word.
>
  *  No, that would be too unnatural to ever work efficiently.    (02)

[JFS] > But it is possible, for texts where clarity and precision are critical, 
for the author to use tools that can help detect ambiguity, avoid words that
>  could be problematical, and suggest simpler syntax for phrases that are 
>overly complex    (03)

   *  Yes, that was the intended implication.  But what I had in mind goes 
beyond use of controlled natural language, useful as that is:    (04)

[JFS] 
> >  The Boeing Simplified English Checker helps writers comply with ASD 
>>  Simplified Technical English (STE), developed by the AeroSpace and 
>>  Defence Industries Association of Europe.
>
> The result of such a checker is much easier for non-native speakers to 
>understand.  For aircraft maintenance, that is critical for manuals used by 
>workers at airports around the world.    (05)

    *  For the more general case of getting texts sufficiently unambiguous for 
an NLU program to understand, the NLU program would have to be part of the loop 
in developing the set of senses that it can understand.  And since NLU programs 
still have a long way to go to achieve the necessary performance, this implies 
a progressive iterative effort to develop (at least one)    NLU program and a 
set of senses that it can understand so as to achieve human-level 
interpretation of a broad range of texts.  Such iterative development is called 
a "spiral" (upwards, implied) in some circles.    (06)

>> [PC]  I would expect computers to be able, eventually, to do better than any 
>> given pair of annotators at finding the right meaning, provided that 
>> the "meanings" are in fact distinguishable.
>
> [JFS] I agree with qualifications.  But the two main qualifications are (1) 
>when do you expect "eventually" to occur,    (07)

   *  That depends strongly on how much funding (government or private) is 
directed toward the goal.     (08)

> [JFS] 
>  and (2) what do you mean by "distinguishable" meanings    (09)

  *  Those meanings that can be reliably distinguished (>98%) by motivated 
(rewarded for accuracy)  human annotators.    (010)

[JFS] > For point #2, every unabridged dictionary has a different set of word 
senses.  The number of word senses is constantly growing and changing, and many 
linguists have strong doubts about the possibility of having any precisely 
defined set.    (011)

  *  Right, right, right!  Every month, valley girls and students in college 
bull sessions are inventing new senses for words that (thankfully) never 
penetrate into the serious texts that are intended for conveying meaningful 
information.  I would expect any program for automatic language understanding 
and sense-disambiguation  to focus on the texts (and sets of meanings in them) 
that are significant enough to warrant spending money for the purpose of 
properly interpreting them and archiving the semantic information contained.  I 
would expect this set of senses to be a small fraction of all the senses 
identifiable by lexicographers, but it will still be a very large and vitally 
important  semantic lexicon, equivalent to that of a college-educated (but not 
omniscient) adult.    [I once saw a comment about some significant number of 
word senses that appear only in Shakespeare - not likely candidates].  The goal 
of human-level performance is reasonable (there is an existence proof), and 
does not require an unachievable infinite capability.    (012)

[JFS] > However, it does require a lot of training and practice to use such a 
language checker.     (013)

  *  Perhaps, but that would depend on how well the program is written.  And 
(see below) motivation will be key.    (014)

[JFS] >  And the overwhelming amount of NL data on the WWW has never been and 
never will be checked by such tools.    (015)

  *  Right, and few people will lament the non-inclusion of the voluminous 
garbage cluttering the Web in a database of supposed "facts'".    (016)

[DE]
> I'd be willing to argue that it's the "secondary" (#2 & onwards) readers of 
>text who would be more highly motivated to annotate ambiguous terms.    (017)

  *  Perhaps, but in any case motivation of the creators of text to use the 
annotating tools will be vital to success.  There are at least two identifiable 
sets of motivated text-creators: (1) those who are serious about disseminating 
meaningful information, for public interest or any other purpose (e.g. college 
teachers and some contributors to WikiPedia); and (2) those in organizations 
that want (some of) their texts to be automatically interpretable, in-house or 
more widely.  In the latter case, they would have to set up institutional 
motivators - at the minimum to recognize where appropriate that such annotation 
(most likely voluntary) is part of the job and to allow the time needed as part 
of the job.   And in the latter case, some bonuses for accuracy of annotation 
may improve the motivation.      (018)

   There is a great deal of discussion in government agencies about wanting 
"interoperability", and there are various programs intended to achieve that 
goal, at least partly .  When they understand how much it will cost to actually 
achieve the goal of interoperability at a human level of interpretation, they 
will have to decide if it is worth the effort.  But I am convinced that the 
effort is perfectly feasible, if adequate resources are applied.  Among the 
resources, a well-tested semantic lexicon with an adequate set of 
distinguishable senses is, I believe, vital and achievable.  For now, outside 
of controlled vocabularies, WordNet, with all its known problems, is the only 
thing that I have seen used.  We can do a lot better.    (019)

Pat    (020)

Patrick Cassidy
MICRA Inc.
cassidy@xxxxxxxxx
1-908-561-3416    (021)


-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx 
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John F Sowa
Sent: Wednesday, July 31, 2013 10:29 AM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] Context and Inter-annotator agreement    (022)

Pat,    (023)

I agree with the last sentences of the following comment.  I agree with the 
first, but I would qualify the word 'know'.  The middle sentence is more 
problematical.    (024)

PC
> The only ones who really know the "meaning" of a word are the ones who 
> created the text.  It would not be too difficult to have creators of 
> text label the senses that they intend, and through a series of 
> iterations, find a set of senses that text creators and text 
> annotators can agree on with a precision that would satisfy Miss 
> Elliott.  I find little interest at NLP meetings for work of that kind.    (025)

There is no way that an author or speaker could write or speak fluently about 
any subject while constantly thinking about which word sense to select for 
every word.    (026)

But it is possible, for texts where clarity and precision are critical, for the 
author to use tools that can help detect ambiguity, avoid words that could be 
problematical, and suggest simpler syntax for phrases that are overly complex.    (027)

Note the following passage from the Boeing web site:    (028)

Source: http://www.boeing.com/boeing/phantom/sechecker/checker.page
> A language checker is a software application that helps authors comply 
> with a controlled-language specification. Examples of controlled 
> languages include ASD Simplified Technical English, Attempto 
> Controlled English, Caterpillar Technical English, Global English and 
> the U.S. government's Plain Language specification.
>
> The Boeing Simplified English Checker helps writers comply with ASD 
> Simplified Technical English (STE), developed by the AeroSpace and 
> Defence Industries Association of Europe.    (029)

The result of such a checker is much easier for non-native speakers to 
understand.  For aircraft maintenance, that is critical for manuals used by 
workers at airports around the world.    (030)

That result isn't sufficiently precise to be translated to logic, but it is 
usually easier to translate to other NLs by automated tools.    (031)

It can also be a first step toward a semi-automated generation of some 
knowledge representation language.  It would also be very useful for checking 
the comments in KR languages, such as OWL, where more of the semantics is 
buried in the NL comments than in the formal operators.    (032)

However, it does require a lot of training and practice to use such a language 
checker.  And the overwhelming amount of NL data on the WWW has never been and 
never will be checked by such tools.    (033)

PC
> I would expect computers to be able, eventually, to do better than any 
> given pair of annotators at finding the right meaning, provided that 
> the "meanings" are in fact distinguishable.    (034)

I agree with qualifications.  But the two main qualifications are (1) when do 
you expect "eventually" to occur, and (2) what do you mean by "distinguishable" 
meanings.    (035)

For point #1, semi-automated checkers, such as Boeing's and others, can be and 
should be more widely used in conjunction with KR tools.    (036)

Some of the VivoMind software that was developed, paid for, and used for 
practical applications has detected issues that humans, working without such 
aids, missed.  See slides 111 to 157 of    (037)

    http://www.jfsowa.com/talks/goal.pdf    (038)

Tools that use or extend such technologies should be the focus of much more R & 
D than is currently being devoted to them.    (039)

For point #2, every unabridged dictionary has a different set of word senses.  
The number of word senses is constantly growing and changing, and many 
linguists have strong doubts about the possibility of having any precisely 
defined set.    (040)

Alan Cruse made the point that there is no limit to the number of fine-grained 
senses that can be useful.  Slide 54 of goal.pdf (copy below) summarizes and 
illustrates that point.  For more on this issue, see slides 46 to 78 of goal.pdf    (041)

John
________________________________________________________________    (042)

Slide 54 of http://www.jfsowa.com/talks/goal.pdf    (043)

                            MICROSENSES    (044)

The linguist Allen Cruse coined the term microsense for a specialized sense of 
a word in a particular application.    (045)

Examples of microsenses:    (046)

  ● Spatial terms in different situations and points of view.
  ● The many kinds of chairs or numbers in the egg whites.
  ● The kinds of balls in various ball games: baseball, basket ball, billiard 
ball, bowling ball, football, golf ball, softball, tennis ball.
  ● Computer science requires precise definitions, but the meanings change 
whenever programs are revised or extended.
  ● Consider the term 'file system' in Unix, Apple OS X, Microsoft Windows, 
and IBM mainframes.    (047)

Microsenses develop through usage in different situations.    (048)

The number and kinds of new uses and innovations grow independently of any 
attempt to limit the meanings of words.    (049)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/ Community Wiki: 
http://ontolog.cim3.net/wiki/ To join: 
http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (050)



_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (051)

<Prev in Thread] Current Thread [Next in Thread>