ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Context and Inter-annotator agreement

To: "[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Michael Brunnbauer <brunni@xxxxxxxxxxxx>
Date: Fri, 9 Aug 2013 18:31:18 +0200
Message-id: <20130809163117.GA20437@xxxxxxxxxxxx>

Hello John,    (01)

some while ago, I experimented with the Enju HPSG natural language parser 
from the University of Tokyo: http://www.nactem.ac.uk/tsujii/enju/    (02)

I was able to generate graphs from natural language that look like this:    (03)


http://www.imagesnippets.com/imgtag/images/brunni@xxxxxxxxxxxx/Bildschirmfoto2.html    (04)

The labeled edges here are simplifications for better visualization.
Disappear -from-> camp really is a "from" node connected with disappear and
camp and an order for the two connections - a binary relation if you want.    (05)

Then I had the idea to regard such graphs with concrete or variable words
as statistical events and to use them for a Naive Bayes classifier. It was
a bit tricky to prevent use of events that are clearly not independent - due to 
variables and subgraphs - but we managed.    (06)

It turned out that the performance of the classifiers with maximum event size
1 (one word) were as good as the classifiers with higher maximum event size.    (07)

Other people have successfully developed Bayes classifiers that use pairs or 
triples of neighboring words but the gain drops quickly with every extra
word.    (08)

What you do is quite different to what I tried but it is interesting that you 
get high value out of looking at groups of semantically connected words.    (09)

Can you tell something about the performance of your system in relation
to the maximum canonical graph size ?    (010)

Regards,    (011)

Michael Brunnbauer    (012)

On Mon, Aug 05, 2013 at 12:35:40PM -0400, John F Sowa wrote:
> Kingsley and Michael,
> 
> KI
> > I learned an entire language without a dictionary. The Edo language
> > from Nigeria is part of my heritage. I was born in the UK and my parents
> > took me to Nigeria at age 7. I learned to speak the language fluently
> > (within 2-3 years) without reading a single book or looking at a single
> > dictionary. To this very day, I can understand and speak it fluently,
> > but I can't actually write in this language without the aid of
> > an Edo dictionary and thesaurus.
> >
> > BTW -- during the same period (and beyond) I failed woefully at trying
> > to learn French and German (from afar) using dictionaries
> 
> That's a good summary of the way children learn language and of
> the hopelessly inadequate way that most adults are taught language.
> Word senses are a byproduct of dictionary development.  The number
> of senses in any dictionary depends on (a) the number of citations
> the lexicographers start with, (b) the amount of time and space they
> have, and (c) their preferences for lumping or splitting.
> 
> As you note, dictionary definitions can be helpful for many purposes,
> but they are *not* the normal basis for human language use.
> 
> MB
> > If I understand your mails and slides correctly, you parse natural language
> > to conceptual graphs using a link grammar parser. The words are the concept
> > nodes and the relation nodes ("roles") are chosen by the parser from the 
>Verb
> > Semantics Ontology.
> 
> There's a lot more detail.  Those two sentences are roughly correct, but
> it's important to emphasize that we use a *society* of agents (inspired
> by Minsky's book) that operate in parallel.  Following is the original
> article that describes the Flexible Modular Framework (FMF):
> 
>     http://www.jfsowa.com/pubs/arch.htm
>     Architectures for intelligent systems
> 
> The FMF evolved quite a bit during the past decade, but that article
> describes the basic principles:  message passing among an open-ended
> society of heterogeneous agents that operate in parallel.  During
> language analysis, the society may have thousands of agents, most
> of which are asleep until something that matches the pattern(s)
> they recognize wakes them up.
> 
> The following point requires a lot more explanation:
> 
> MB
> > The concept nodes (words) are usually not mapped to some concept from
> > an ontology (disambiguated). So they are usually only constrained
> > in their meaning by the text the system gets.
> 
> The "plain vanilla" (very general and underspecified) ontology that we
> use is derived from various sources.  The top levels are based on the
> KR ontology from the appendix of my Knowledge Representation book.
> Most of that is summarized in http://www.jfsowa.com/ontology/ .
> 
> Arun Majumdar, who designed and implemented the software, derived the
> VSO ontology for verbs from the IBM-CSLI sources, and he added more
> to that web site:  http://lingo.stanford.edu/vso/ .
> 
> For most word senses, we assume that the spelling of the word itself
> is the name of a supertype of its major senses.  Homonyms (such as
> river bank vs financial bank) are considered distinct.
> 
> Each concept or relation type has one or more "canonical graphs",
> which are conceptual graphs that specify a pattern of concept and
> relation types.  If the type constraints hold, that is a sign of
> normal use.  Violations indicate metaphor, metonymy, some innovation
> by the author, or some mistake by somebody or something.
> 
> Background knowledge is derived from various sources:  structured
> information from formal representations or unstructured information
> from NL documents or discourse -- that includes information from
> earlier parts (sometimes later parts) of the same text.
> 
> For applications that require high precision (such as the legacy
> re-engineering task or for the DoE task of extracting information
> about chemical compounds), a specialized formal ontology is necessary.
> But it can often be derived by mapping formal representations to
> conceptual graphs (which are also formally defined).  But most of
> the words in those documents are *not* defined at the same level
> of precision as the specialized ontology.
> 
> We use Cognitive Memory (TM) for storing and finding background
> knowledge represented as CGs.  A semantic distance measure is used
> to determine the closeness of the match.  And graphs derived from
> various sources have an estimate of their accuracy or reliability.
> We also use various learning and voting methods for deriving and
> revising those estimates -- for both individual graphs and for
> the agents that contribute those graphs to the analysis.
> 
> We also use an open-ended variety of other lexical resources from
> various sources.  WordNet is good for what it does, and Roget's
> Thesaurus is better for many words (especially adjectives).
> 
> But we do *not* attempt to combine all lexical resources into
> a single unified source.  That would be (a) difficult to do,
> (b) difficult to maintain, and (c) unnecessary.  We just assign
> an agent to each resource.  It wakes up when one of the words
> it can handle appears in the input stream.  It sends a message
> to the parser with the information it finds and goes back to sleep.
> 
> When we get the time to write more, Arun and I plan to produce
> a much more detailed explanation of all these issues.  For now,
> the following two articles are useful summaries:
> 
>     http://www.jfsowa.com/pubs/paradigm.pdf
>     Two paradigms are better than one
>     and multiple paradigms are even better
> 
>     http://www.jfsowa.com/pubs/futures.pdf
>     Future directions for semantic systems
> 
> We also plan to (a) license Cognitive Memory and (b) release
> a version of the Flexible Modular Framework as free open source
> under a license such as LGPL or Apache.  But that won't happen
> until we earn enough money from contracts and have the luxury
> of hiring people who can help with the work.
> 
> MB
> > This understanding seems to have areas of application (your customers) but
> > would you really compare it with human understanding of natural language ?
> 
> Short answer: yes.
> 
> Longer answer: The question of whether AI techniques should be based on
> or influenced by ongoing research in psychology and neuroscience has
> been debated since the founding workshop on AI in 1956.  Many people
> whose opinions I respect, such as Marvin Minsky, Sydney Lamb, and
> others, say "yes" but with qualifications.
> 
> I strongly believe in the importance of relating NLP methods to the
> research in all branches of cognitive science.  I did that in my 1984
> book (Conceptual Structures:  Information Processing in Mind and
> Machine), in many talks and publications since then, and in the
> choice of methods we use at our company VivoMind Research.
> 
> Arun Majumdar, who invented the algorithms for Cognitive Memory, was
> inspired by what I wrote in Chapter 2 on cognitive psychology in my
> 1984 book.  He developed the basic methods before we met.
> 
> For an even longer answer, see http://www.jfsowa.com/talks/goal.pdf ,
> http://www.jfsowa.com/talks/relating.pdf , and the many references
> cited at the bottom of various slides and in the final slides.
> 
> John
>  
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/ 
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>      (013)

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni@xxxxxxxxxxxx
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel    (014)

Attachment: pgpV0pMTMELmB.pgp
Description: PGP signature


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>