ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Context and Inter-annotator agreement

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Patrick Cassidy" <pat@xxxxxxxxx>
Date: Mon, 5 Aug 2013 13:34:57 -0400
Message-id: <008501ce9202$1b0f4ae0$512de0a0$@micra.com>
John,
   On a couple of points:    (01)

[JFS] >Pat,
 >
 >The essential point is that people do *not* require prior definitions of
word
 >senses in order to understand the words in a conversation or a document.
      That is sometimes true, but rarely.  People can only learn a **very
small* number of words at a time from the surrounding context.   That is why
breaking codes is such a complex job - people can't disambiguate multiple
sequential unknown words in their head, on first reading.  The context in
written messages is the surrounding text, and unless the meanings of most of
the words are already known, it will be a difficult code-breaking job to
infer more than a few new ones.  So people do indeed require prior learned
understandings of most words to be able to grasp text meanings and infer
(more or less accurately) the meanings of new words.  Remember how long it
took to decipher linear B?   with a lot of effort?  And that was intended to
be clear, not obscure.    (02)

     Do you have in mind some measure for how many word meanings can be
inferred accurately from context?  In my experience, it is rather small.  I
recall when I was rather young (5 or 6) watching TV and being told
periodically by the station that they will "interrupt this broadcast" for
"station identification".  My similar-age brother and sister and I  puzzled
over this and agreed, from the sound of the word and the  *context*,  that
"identification" was some kind of "vacation".   My father's explanation made
it clear that this was not the meaning, but we couldn't quite figure out
exactly what he meant either.   You need **a lot** of context (and often
prior knowledge) to clearly grasp the meaning of a new word, and in written
communication that means that you have to already know the meanings of most
of the words in that text.
     Of course, no one has to look up word definitions  in a dictionary, but
they must have learned the meaning (from context or definition)  and stored
the meaning in their mind before being able to understand texts that do not
have sufficient context to disambiguate that word, which will probably be
most text, from my own observations.  In my own experience, when
encountering new words in a text, if they are not actually defined, at best
I can make a guess as to the generic *type* of thing they are, but almost
never understand how they differ from other things of that type.  That has
to be learned explicitly, unless it is also specified in the text .    (03)

   So, yes, people do required prior "definitions" (i.e. meanings gained
from prior experience in some form) to understand most text.  If you have
references to actual experimental  tests of this issue - how much context is
needed to learn how much about a new word - I would appreciate the pointer.     (04)

[PC] >> I want to be able to build a machine that can understand and
fluently
 >> talk to a 6-year old native speaker of English.  As you have noted,
 >> that is itself quite challenging.
 >
[JFS]  >Up to that point, we agree.
 >
[PC]  >> but requires a vocabulary of only 5-10 thousand word senses.
 >
[JFS]  >No!  Children at the age of 3 or 4 are learning dozens of words per
day.    (05)

     Well, I have seen various estimates for the rate at which children
learn words, but the estimates I have seen for the typical vocabulary of a
six-year old (depends on upbringing) seem to range from 5000 to 15000 words.
One 1902 reference listed the observed verbally spoken  vocabulary of a
six-year old boy, giving a list of  2238 "root" words (not syntactic
variants).  He probably understood more words than he used.   Do you have
specific references to other specific lists???  They would be really useful
for my work.     The higher number of 15000 would average 10 words per day
for learning from ages 2 to 6.  At that age, most words will have a single
sense.  More subtlety comes later.   Contextual learning depends a great
deal on the unknown word, even for concrete objects, which are only 10
percent of a kid's vocabulary.    And, yes, kids do learn a lot of their new
words by being told what they mean by the speaker, not just from context.  I
have seen descriptions of experiments showing that context can give one a
vague feeling for what a word might mean, much less precise than a
definition.    (06)

[JFS] >
 >As I recall from an email discussion we had a few years ago, I cited
examples
 >from Longmans Dictionary that showed how their primitives shifted from one
 >use to another.  Their definitions are useful for humans because people
fill in
 >the gaps with their own background knowledge.  But computers don't have
 >that background.
 >    (07)

    Yes, and one of the challenges in developing a primitives-based
foundation ontology is to determine, from constructing logical
specifications of word meaning, just how many senses of the basic words are
required to account for all of the needed primitive senses used in
definitions .   Thus far it appears that **on average** fewer than three
senses per word are required, and perhaps fewer than  two.      (08)

   *BUT* One interesting thing about the Longman defining vocabulary is
that, though you can quibble with how precise any given Longman definition
is, if you decide to create a more precise definition for your own purpose,
you can still do it  ***using the same defining vocabulary***.   The
terseness of many Longman definitions is just part of a "good enough for the
intended audience" policy - but you can also make logical specifications
that are good enough for computers using the same set of basic concepts (in
ontological form).  The average multiplicity of word senses for such usage
is still fewer than three.    (09)

Pat    (010)

Patrick Cassidy
MICRA Inc.
cassidy@xxxxxxxxx
1-908-561-3416    (011)


 >-----Original Message-----
 >From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-
 >bounces@xxxxxxxxxxxxxxxx] On Behalf Of John F Sowa
 >Sent: Monday, August 05, 2013 4:46 AM
 >To: ontolog-forum@xxxxxxxxxxxxxxxx
 >Subject: Re: [ontolog-forum] Context and Inter-annotator agreement
 >
 >Pat,
 >
 >The essential point is that people do *not* require prior definitions of
word
 >senses in order to understand the words in a conversation or a document.
 >Both children *and* adults use familiar words in new ways constantly.
 >Metaphor, metonomy, and shifts in the Wittgensteinian word games are
 >constantly creating new senses.
 >
 >Just look at the professional lexicographers who create dictionaries.
 >They analyze *citations* of sentences that use the words.  Then they group
 >the citations according to similarity in *use*, and they write
 >
 >Fundamental principle:  Word senses are artificial creations by
 >lexicographers.  People learn meaning through use, and they are completely
 >unaware that they are shifting the meaning when they use the same word in
 >different "language games".
 >
 >> I want to be able to build a machine that can understand and fluently
 >> talk to a 6-year old native speaker of English.  As you have noted,
 >> that is itself quite challenging.
 >
 >Up to that point, we agree.
 >
 >> but requires a vocabulary of only 5-10 thousand word senses.
 >
 >No!  Children at the age of 3 or 4 are learning dozens of words per day.
 >They don't look up words in the dictionary.  They may occasionally ask
 >somebody about a "hard" word, but they learn words from context without
 >even thinking about them.
 >
 >When I was studying Latin in high school, I dutifully looked up every new
 >word.  But in the second year, I got lazy.  I only looked up a few words
when I
 >got completely lost.  And guess what?  I learned Latin much faster and
more
 >thoroughly when I stopped looking up words.
 >
 >> I can identify a set of primitives with which I can define almost
 >> anything
 >
 >I have studied the definitions by Anna Wierzbicka, Margaret Masterman, and
 >others who claim to use a small set of primitives.  They look good on the
 >surface.  But when you analyze them in detail, you discover that they are
 >using their so-called primitives in a very sloppy way.  The meanings of
the
 >primitives have subtle and sometimes not-so-subtle shifts in different
 >definitions.
 >
 >As I recall from an email discussion we had a few years ago, I cited
examples
 >from Longmans Dictionary that showed how their primitives shifted from one
 >use to another.  Their definitions are useful for humans because people
fill in
 >the gaps with their own background knowledge.  But computers don't have
 >that background.
 >
 >John
 >
 >__________________________________________________________
 >_______
 >Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
 >Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
 >Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
 >Shared Files: http://ontolog.cim3.net/file/ Community Wiki:
 >http://ontolog.cim3.net/wiki/ To join: http://ontolog.cim3.net/cgi-
 >bin/wiki.pl?WikiHomePage#nid1J
 >    (012)


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (013)

<Prev in Thread] Current Thread [Next in Thread>