[Top] [All Lists]

Re: [ontolog-forum] Context and Inter-annotator agreement

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Mon, 05 Aug 2013 12:35:40 -0400
Message-id: <51FFD45C.5030007@xxxxxxxxxxx>
Kingsley and Michael,    (01)

> I learned an entire language without a dictionary. The Edo language
> from Nigeria is part of my heritage. I was born in the UK and my parents
> took me to Nigeria at age 7. I learned to speak the language fluently
> (within 2-3 years) without reading a single book or looking at a single
> dictionary. To this very day, I can understand and speak it fluently,
> but I can't actually write in this language without the aid of
> an Edo dictionary and thesaurus.
> BTW -- during the same period (and beyond) I failed woefully at trying
> to learn French and German (from afar) using dictionaries    (02)

That's a good summary of the way children learn language and of
the hopelessly inadequate way that most adults are taught language.
Word senses are a byproduct of dictionary development.  The number
of senses in any dictionary depends on (a) the number of citations
the lexicographers start with, (b) the amount of time and space they
have, and (c) their preferences for lumping or splitting.    (03)

As you note, dictionary definitions can be helpful for many purposes,
but they are *not* the normal basis for human language use.    (04)

> If I understand your mails and slides correctly, you parse natural language
> to conceptual graphs using a link grammar parser. The words are the concept
> nodes and the relation nodes ("roles") are chosen by the parser from the Verb
> Semantics Ontology.    (05)

There's a lot more detail.  Those two sentences are roughly correct, but
it's important to emphasize that we use a *society* of agents (inspired
by Minsky's book) that operate in parallel.  Following is the original
article that describes the Flexible Modular Framework (FMF):    (06)

    Architectures for intelligent systems    (07)

The FMF evolved quite a bit during the past decade, but that article
describes the basic principles:  message passing among an open-ended
society of heterogeneous agents that operate in parallel.  During
language analysis, the society may have thousands of agents, most
of which are asleep until something that matches the pattern(s)
they recognize wakes them up.    (08)

The following point requires a lot more explanation:    (09)

> The concept nodes (words) are usually not mapped to some concept from
> an ontology (disambiguated). So they are usually only constrained
> in their meaning by the text the system gets.    (010)

The "plain vanilla" (very general and underspecified) ontology that we
use is derived from various sources.  The top levels are based on the
KR ontology from the appendix of my Knowledge Representation book.
Most of that is summarized in http://www.jfsowa.com/ontology/ .    (011)

Arun Majumdar, who designed and implemented the software, derived the
VSO ontology for verbs from the IBM-CSLI sources, and he added more
to that web site:  http://lingo.stanford.edu/vso/ .    (012)

For most word senses, we assume that the spelling of the word itself
is the name of a supertype of its major senses.  Homonyms (such as
river bank vs financial bank) are considered distinct.    (013)

Each concept or relation type has one or more "canonical graphs",
which are conceptual graphs that specify a pattern of concept and
relation types.  If the type constraints hold, that is a sign of
normal use.  Violations indicate metaphor, metonymy, some innovation
by the author, or some mistake by somebody or something.    (014)

Background knowledge is derived from various sources:  structured
information from formal representations or unstructured information
from NL documents or discourse -- that includes information from
earlier parts (sometimes later parts) of the same text.    (015)

For applications that require high precision (such as the legacy
re-engineering task or for the DoE task of extracting information
about chemical compounds), a specialized formal ontology is necessary.
But it can often be derived by mapping formal representations to
conceptual graphs (which are also formally defined).  But most of
the words in those documents are *not* defined at the same level
of precision as the specialized ontology.    (016)

We use Cognitive Memory (TM) for storing and finding background
knowledge represented as CGs.  A semantic distance measure is used
to determine the closeness of the match.  And graphs derived from
various sources have an estimate of their accuracy or reliability.
We also use various learning and voting methods for deriving and
revising those estimates -- for both individual graphs and for
the agents that contribute those graphs to the analysis.    (017)

We also use an open-ended variety of other lexical resources from
various sources.  WordNet is good for what it does, and Roget's
Thesaurus is better for many words (especially adjectives).    (018)

But we do *not* attempt to combine all lexical resources into
a single unified source.  That would be (a) difficult to do,
(b) difficult to maintain, and (c) unnecessary.  We just assign
an agent to each resource.  It wakes up when one of the words
it can handle appears in the input stream.  It sends a message
to the parser with the information it finds and goes back to sleep.    (019)

When we get the time to write more, Arun and I plan to produce
a much more detailed explanation of all these issues.  For now,
the following two articles are useful summaries:    (020)

    Two paradigms are better than one
    and multiple paradigms are even better    (021)

    Future directions for semantic systems    (022)

We also plan to (a) license Cognitive Memory and (b) release
a version of the Flexible Modular Framework as free open source
under a license such as LGPL or Apache.  But that won't happen
until we earn enough money from contracts and have the luxury
of hiring people who can help with the work.    (023)

> This understanding seems to have areas of application (your customers) but
> would you really compare it with human understanding of natural language ?    (024)

Short answer: yes.    (025)

Longer answer: The question of whether AI techniques should be based on
or influenced by ongoing research in psychology and neuroscience has
been debated since the founding workshop on AI in 1956.  Many people
whose opinions I respect, such as Marvin Minsky, Sydney Lamb, and
others, say "yes" but with qualifications.    (026)

I strongly believe in the importance of relating NLP methods to the
research in all branches of cognitive science.  I did that in my 1984
book (Conceptual Structures:  Information Processing in Mind and
Machine), in many talks and publications since then, and in the
choice of methods we use at our company VivoMind Research.    (027)

Arun Majumdar, who invented the algorithms for Cognitive Memory, was
inspired by what I wrote in Chapter 2 on cognitive psychology in my
1984 book.  He developed the basic methods before we met.    (028)

For an even longer answer, see http://www.jfsowa.com/talks/goal.pdf ,
http://www.jfsowa.com/talks/relating.pdf , and the many references
cited at the bottom of various slides and in the final slides.    (029)

John    (030)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (031)

<Prev in Thread] Current Thread [Next in Thread>