[Top] [All Lists]

[ontolog-forum] Semantics of Natural Languages

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Sun, 28 Aug 2011 15:47:45 -0400
Message-id: <4E5A9B61.6050909@xxxxxxxxxxx>
Doug, Rich, and Azamat,    (01)

I changed the subject line because there are some fundamental
differences in our assumptions and ways of working.  We're going
off in totally different directions that are unlikely to converge.
Before going further, we should review where we plan to go and how.    (02)

Doug worked at Cyc, and he understands the Cyc ontology, knowledge
base, and methods of reasoning.  He sat down and wrote a precise,
well-defined microtheory based on the Cyc ontology.  It defines
a set of terms that we have been discussing in a way that a system
such as Cyc could use to interpret sentences about those terms,
draw inferences, and answer questions.    (03)

Rich started the thread for a self-interest ontology because he
wanted to address questions about how governments work:  How are
the laws, policies, and regulations of a government related to
the needs and interests of the people in a community?    (04)

> It seems to me that Doug’s initial ontology is at the Theory level...
> Perhaps instead we should... try to make progress elaborating Doug's
> formulation by experiment, observation or classification,
> but in a more focused manner.
> I suggest... that we consider US patent specifications as the narrow
> class of concise situation descriptions, problem statements within
> that situation, and claimed embodiments of solutions.    (05)

Azamat has a much grander goal of trying to specify a global ontology
of everything.  He wants something more prescriptive than descriptive:    (06)

> Nowadays, the scope of human and national interest is formed by
> domineering politics, ideology, or commercial concerns...
> Now a sensible ontology of self-interest is supposed to raise the
> eco-awareness, eco-interest as well, to help  people see their
> responsibility for the environment...    (07)

I sympathize with all three of these goals.  But I'd like to describe
what we've been doing at our VivoMind company.  We have some good
tools, but we only have a very small group, and we don't have any
research funding.  We have to work on projects our clients will pay
us to do, and we have to deliver the kinds of results they want with
whatever budget they're willing to approve.  But at the same time,
we're trying to develop our technology in ways that can solve very
challenging problems of natural language understanding.    (08)

I agree with Rich "that Doug’s initial ontology is at the theory level."
In fact, that has been one of the complaints about Cyc that I discussed
with Lenat since the 1990s.  But I strongly disagree with the word
'narrow' in Rich's phrase:  "the narrow class of concise situation
descriptions, problem statements within that situation, and claimed
embodiments of solutions."    (09)

I realize that Rich has done a lot of work with patents, but there's
a huge difference between the narrow conventions for stating the
patent application and the unbounded subject matter of all the things
that could be patented.  Furthermore, the patent lawyers try to make
the invention sound novel by deliberately using terminology that is
based on standard terms for the context, but different in detail
from anything that anyone else has done.    (010)

It's conceivable that an ontology of self-interest might relate
the inventor's self interest to the parts of a patent application.
But that would also be "at the theory level", not at the level
that any client is likely to pay somebody to do.    (011)

Before doing any work on patents, I want to see an actual problem
that a real customer with real money is willing to pay somebody
to solve.  Give us some patents or patent applications and a
specific problem that some paying client wants to solve.    (012)

As for the environment, I would strongly support any work that could
help preserve the environment, wildlife, etc.  But an ontology has
to clarify the subject matter.  It should not have any built-in
value judgments about what should or should not be done.    (013)

Finally, I'd like to mention something about the level of ontology
that we (at VivoMind) have found most useful.  For examples of
the applications we have worked on, see the following slides:    (014)

    http://www.jfsowa.com/talks/pursue.pdf    (015)

For an excerpt from the kind of "English", see slide 27.  Trying to
translate that text directly to CycL (or any other kind of logic)
would be impossible.  Instead, note the method outlined in slide 26:    (016)

> Much easier task:
> ● Translate the COBOL programs to conceptual graphs.
> ● Use the conceptual graphs from COBOL to interpret the English.
> ● Use the analogy engine to compare the graphs derived from COBOL
> to the graphs derived from English.
> ● Record the similarities and discrepancies.    (017)

The Intellitex parser used a general ontology with very few axioms.
But the details needed for the application came from analyzing
the data structures and definitions in COBOL, adding all the
names of files and programs to the English vocabulary, and
translating that information to conceptual graphs (CGs).    (018)

Then those CGs served as the semantic foundation for interpreting
the English sentences.  Any sentences that did not refer to anything
derived from the COBOL programs were ignored as irrelevant.    (019)

Note that this application did not require any predefined axioms
or any detailed ontology other than a simple hierarchy of terms.
But it did require a very large and detailed low-level ontology.
Fortunately, that ontology could be automatically derived from
a formal language (COBOL).    (020)

I won't claim that this is the only way to do language understanding,
but it worked very well.  It shows how a very detailed low-level
ontology can be used to solve a complex problem without requiring
any axioms from an upper-level or mid-level ontology.  It did,
however, require the kind of lexical information that could be
derived from WordNet or something similar.    (021)

I'll admit that more details at the upper and mid levels would
often be useful.  But it is also possible to derive much of that
information automatically by analyzing documents in English.
The application in slides 33 to 41 is an example that shows how.    (022)

Slide 34 shows the source material, which included 79 documents,
some of which were research reports and others were chapters
from a textbook on geology.  Before answering any queries,
the VivoMind system translated all the documents to conceptual
graphs and indexed them with the Cognitive Memory (TM) system.    (023)

Slide 40 shows how the query (a paragraph written by a professional
geologist) was related to the answer by using information from
Chapters 44 and 45 of the textbook.    (024)

The answer to the geologist's query was derived from a research
report (McCaffrey and Kneller 2001), but it was not possible
to match the sentences in the query directly to that report.    (025)

However, Chapter 44 contained information about three terms:
"lowstand fan", "passive margin", and "turbiditic sandstones".
Chapter 45 contributed information about three other terms:
"narrow feeder corridors", "stratigraphic onlap", and
"intraslope basin".    (026)

Slide 41 summarizes the method:    (027)

> Emergent Knowledge
> When reading the 79 documents,
> ● VLP translates the sentences and paragraphs to CGs.
> ● But it does not do any further analysis of the documents.
> When a geologist asks a question,
> ● The VivoMind system may find related phrases in many sources.
> ● To connect those phrases, it may need to do further searches.
> ● The result is a conceptual graph that relates the question to
> multiple passages in multiple sources.
> ● Some of those sources might contribute information that does not
> have any words that came from the original question.
> ● That new CG can be used to answer further questions.
> By a “Socratic” dialog, the geologist can lead the system to
> explore novel paths and discover unexpected patterns.    (028)

Note that these applications show that the detailed, low-level,
highly domain-dependent information is the most important.  But
that information is usually highly voluminous.  It is impractical
or impossible to define all of it in advance, especially by hand.
But automated methods can often derive that kind of information
from unstructured, natural language documents.    (029)

John    (030)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (031)

<Prev in Thread] Current Thread [Next in Thread>