John and David,
Some follow-up: (01)
[JFS] > There is no way that an author or speaker could write or speak
fluently about any subject while constantly thinking about which word sense to
select for every word.
>
* No, that would be too unnatural to ever work efficiently. (02)
[JFS] > But it is possible, for texts where clarity and precision are critical,
for the author to use tools that can help detect ambiguity, avoid words that
> could be problematical, and suggest simpler syntax for phrases that are
>overly complex (03)
* Yes, that was the intended implication. But what I had in mind goes
beyond use of controlled natural language, useful as that is: (04)
[JFS]
> > The Boeing Simplified English Checker helps writers comply with ASD
>> Simplified Technical English (STE), developed by the AeroSpace and
>> Defence Industries Association of Europe.
>
> The result of such a checker is much easier for non-native speakers to
>understand. For aircraft maintenance, that is critical for manuals used by
>workers at airports around the world. (05)
* For the more general case of getting texts sufficiently unambiguous for
an NLU program to understand, the NLU program would have to be part of the loop
in developing the set of senses that it can understand. And since NLU programs
still have a long way to go to achieve the necessary performance, this implies
a progressive iterative effort to develop (at least one) NLU program and a
set of senses that it can understand so as to achieve human-level
interpretation of a broad range of texts. Such iterative development is called
a "spiral" (upwards, implied) in some circles. (06)
>> [PC] I would expect computers to be able, eventually, to do better than any
>> given pair of annotators at finding the right meaning, provided that
>> the "meanings" are in fact distinguishable.
>
> [JFS] I agree with qualifications. But the two main qualifications are (1)
>when do you expect "eventually" to occur, (07)
* That depends strongly on how much funding (government or private) is
directed toward the goal. (08)
> [JFS]
> and (2) what do you mean by "distinguishable" meanings (09)
* Those meanings that can be reliably distinguished (>98%) by motivated
(rewarded for accuracy) human annotators. (010)
[JFS] > For point #2, every unabridged dictionary has a different set of word
senses. The number of word senses is constantly growing and changing, and many
linguists have strong doubts about the possibility of having any precisely
defined set. (011)
* Right, right, right! Every month, valley girls and students in college
bull sessions are inventing new senses for words that (thankfully) never
penetrate into the serious texts that are intended for conveying meaningful
information. I would expect any program for automatic language understanding
and sense-disambiguation to focus on the texts (and sets of meanings in them)
that are significant enough to warrant spending money for the purpose of
properly interpreting them and archiving the semantic information contained. I
would expect this set of senses to be a small fraction of all the senses
identifiable by lexicographers, but it will still be a very large and vitally
important semantic lexicon, equivalent to that of a college-educated (but not
omniscient) adult. [I once saw a comment about some significant number of
word senses that appear only in Shakespeare - not likely candidates]. The goal
of human-level performance is reasonable (there is an existence proof), and
does not require an unachievable infinite capability. (012)
[JFS] > However, it does require a lot of training and practice to use such a
language checker. (013)
* Perhaps, but that would depend on how well the program is written. And
(see below) motivation will be key. (014)
[JFS] > And the overwhelming amount of NL data on the WWW has never been and
never will be checked by such tools. (015)
* Right, and few people will lament the non-inclusion of the voluminous
garbage cluttering the Web in a database of supposed "facts'". (016)
[DE]
> I'd be willing to argue that it's the "secondary" (#2 & onwards) readers of
>text who would be more highly motivated to annotate ambiguous terms. (017)
* Perhaps, but in any case motivation of the creators of text to use the
annotating tools will be vital to success. There are at least two identifiable
sets of motivated text-creators: (1) those who are serious about disseminating
meaningful information, for public interest or any other purpose (e.g. college
teachers and some contributors to WikiPedia); and (2) those in organizations
that want (some of) their texts to be automatically interpretable, in-house or
more widely. In the latter case, they would have to set up institutional
motivators - at the minimum to recognize where appropriate that such annotation
(most likely voluntary) is part of the job and to allow the time needed as part
of the job. And in the latter case, some bonuses for accuracy of annotation
may improve the motivation. (018)
There is a great deal of discussion in government agencies about wanting
"interoperability", and there are various programs intended to achieve that
goal, at least partly . When they understand how much it will cost to actually
achieve the goal of interoperability at a human level of interpretation, they
will have to decide if it is worth the effort. But I am convinced that the
effort is perfectly feasible, if adequate resources are applied. Among the
resources, a well-tested semantic lexicon with an adequate set of
distinguishable senses is, I believe, vital and achievable. For now, outside
of controlled vocabularies, WordNet, with all its known problems, is the only
thing that I have seen used. We can do a lot better. (019)
Pat (020)
Patrick Cassidy
MICRA Inc.
cassidy@xxxxxxxxx
1-908-561-3416 (021)
-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John F Sowa
Sent: Wednesday, July 31, 2013 10:29 AM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] Context and Inter-annotator agreement (022)
Pat, (023)
I agree with the last sentences of the following comment. I agree with the
first, but I would qualify the word 'know'. The middle sentence is more
problematical. (024)
PC
> The only ones who really know the "meaning" of a word are the ones who
> created the text. It would not be too difficult to have creators of
> text label the senses that they intend, and through a series of
> iterations, find a set of senses that text creators and text
> annotators can agree on with a precision that would satisfy Miss
> Elliott. I find little interest at NLP meetings for work of that kind. (025)
There is no way that an author or speaker could write or speak fluently about
any subject while constantly thinking about which word sense to select for
every word. (026)
But it is possible, for texts where clarity and precision are critical, for the
author to use tools that can help detect ambiguity, avoid words that could be
problematical, and suggest simpler syntax for phrases that are overly complex. (027)
Note the following passage from the Boeing web site: (028)
Source: http://www.boeing.com/boeing/phantom/sechecker/checker.page
> A language checker is a software application that helps authors comply
> with a controlled-language specification. Examples of controlled
> languages include ASD Simplified Technical English, Attempto
> Controlled English, Caterpillar Technical English, Global English and
> the U.S. government's Plain Language specification.
>
> The Boeing Simplified English Checker helps writers comply with ASD
> Simplified Technical English (STE), developed by the AeroSpace and
> Defence Industries Association of Europe. (029)
The result of such a checker is much easier for non-native speakers to
understand. For aircraft maintenance, that is critical for manuals used by
workers at airports around the world. (030)
That result isn't sufficiently precise to be translated to logic, but it is
usually easier to translate to other NLs by automated tools. (031)
It can also be a first step toward a semi-automated generation of some
knowledge representation language. It would also be very useful for checking
the comments in KR languages, such as OWL, where more of the semantics is
buried in the NL comments than in the formal operators. (032)
However, it does require a lot of training and practice to use such a language
checker. And the overwhelming amount of NL data on the WWW has never been and
never will be checked by such tools. (033)
PC
> I would expect computers to be able, eventually, to do better than any
> given pair of annotators at finding the right meaning, provided that
> the "meanings" are in fact distinguishable. (034)
I agree with qualifications. But the two main qualifications are (1) when do
you expect "eventually" to occur, and (2) what do you mean by "distinguishable"
meanings. (035)
For point #1, semi-automated checkers, such as Boeing's and others, can be and
should be more widely used in conjunction with KR tools. (036)
Some of the VivoMind software that was developed, paid for, and used for
practical applications has detected issues that humans, working without such
aids, missed. See slides 111 to 157 of (037)
http://www.jfsowa.com/talks/goal.pdf (038)
Tools that use or extend such technologies should be the focus of much more R &
D than is currently being devoted to them. (039)
For point #2, every unabridged dictionary has a different set of word senses.
The number of word senses is constantly growing and changing, and many
linguists have strong doubts about the possibility of having any precisely
defined set. (040)
Alan Cruse made the point that there is no limit to the number of fine-grained
senses that can be useful. Slide 54 of goal.pdf (copy below) summarizes and
illustrates that point. For more on this issue, see slides 46 to 78 of goal.pdf (041)
John
________________________________________________________________ (042)
Slide 54 of http://www.jfsowa.com/talks/goal.pdf (043)
MICROSENSES (044)
The linguist Allen Cruse coined the term microsense for a specialized sense of
a word in a particular application. (045)
Examples of microsenses: (046)
● Spatial terms in different situations and points of view.
● The many kinds of chairs or numbers in the egg whites.
● The kinds of balls in various ball games: baseball, basket ball, billiard
ball, bowling ball, football, golf ball, softball, tennis ball.
● Computer science requires precise definitions, but the meanings change
whenever programs are revised or extended.
● Consider the term 'file system' in Unix, Apple OS X, Microsoft Windows,
and IBM mainframes. (047)
Microsenses develop through usage in different situations. (048)
The number and kinds of new uses and innovations grow independently of any
attempt to limit the meanings of words. (049)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/ Community Wiki:
http://ontolog.cim3.net/wiki/ To join:
http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (050)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (051)
|