[Top] [All Lists]

Re: [ontolog-forum] Fundamental questions about ontology use and reuse

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Wed, 24 Jun 2009 12:30:44 -0400
Message-id: <4A4254B4.4010303@xxxxxxxxxxx>
Pat,    (01)

I'm responding to this note under the "Fundamental questions" thread,
because it's more closely related to that thread.    (02)

PC> One of the possible components of the Foundation Ontology project
 > that I think should be included is an API that serves to enable
 > global search over any set of relational databases that have elements
 > mapped to the FO.    (03)

There are four kinds of global search, each of which has very different
requirements:    (04)

  1. A very precise search that supports detailed reasoning.  Some
     systems do that by aligning multiple databases in a federation
     that enables SQL queries to be executed across all the DBs in
     the federation.  Wolfram Alpha also does that with their curated
     and carefully structured databases, which are designed to work
     together in a similar federation.  That federation presupposes
     a common ontology (developed by Wolfram) with axioms that support
     the necessary operations.    (05)

  2. The search performed by Google, Yahoo, etc. does not support
     the reasoning that can be done with multiple federated databases.
     What they do is information retrieval, and any system (human or
     computer) that uses the retrieved information must examine the
     retrieved files to extract whatever information they need.    (06)

  3. Search and computation against data that had already been
     aligned according to some standard *terminology*.  Many web
     sites have data (and forms for entering data) that are tagged
     with terms such as 'first name', 'last name', 'address', etc.
     Those terminologies have very few axioms, and they make very
     few assumptions about the details of whatever those terms
     refer to.  But for a great many applications, such as selling
     books and mailing packages, people use those terms with
     sufficient consistency that the applications work successfully.    (07)

  4. Search and inference across multiple texts written in natural
     language and multiple databases that have not been aligned.
     This is the kind of task that VivoMind specializes in, and
     I'll say more about it below.    (08)

PC> This is a capability advertised by Cyc and Ontology Works (and
 > a few others with which I am not familiar), but no one can
 > actually try it out on any scale without spending a few hundred+
 > kilobucks to hire the companies.    (09)

An ontology won't reduce the cost in the slightest -- *except for*
those databases that had previously been designed or federated
according to that ontology.  If you want to federate the databases
before doing the inferences, you would still need the services of
a company such as OntologyWorks, or you would have to do the
equivalent work by yourself (and/or by your employees).    (010)

PC> Having a working ontology-based RDB integration system would,
 > I expect, make a lot more people interested in the potential for
 > an ontology.    (011)

That is true *only* for those databases that had previously been
integrated, either designed from the ground up on a common ontology
or previously federated by a company such as OntologyWorks.    (012)

If they hadn't been previously federated, you're back to the step
of spending "a few hundred+ kilobucks" to hire OntologyWorks or
to do the equivalent work with your own employees.    (013)

PC> I envision it also includes development of an effective
 > Natural Language interface...    (014)

Fortunately, you don't need to develop such an interface, because
you can point to the work by VivoMind, which already does that,
but without using anything that resembles the kind of foundational
ontology you are proposing.    (015)

PC> You have said on numerous occasions, and I agree, that it is
 > important to take legacy systems into consideration to encourage
 > adoption of a new technology.   This is one way to do it and
 > still provide a basis for scale up to the more demanding
 > applications that could take full advantage of the logical
 > inferencing potential of an ontology.    (016)

This is the kind of work that we do today at VivoMind.  Before
reading the rest of this email note, I suggest that you look
at the results from some actually implemented systems:    (017)

    http://www.jfsowa.com/talks/pursue.pdf    (018)

All the slides are relevant, but the three applications I'll
discuss begin on slide #14.    (019)

   *** Pause while you open that file and turn to slide #14 ***    (020)

Slide #14 summarizes the three applications.  The first two use
older versions of VivoMind software, and the third uses some of
our latest software (which is capable of supporting the first
two on a much larger scale than the old software).    (021)

The first application, Educational Software, is described on
slides #15 to #19.  Three different companies tried to do it.    (022)

The first company did something along the lines you are proposing:
use a large ontology and deductive methods of reasoning.  That
approach failed for reasons stated on slide #18.  The second one,
which used a statistical method, also failed.  See slide #19.    (023)

Slides #20 to #23 describe the VivoMind approach, which worked.
For interpreting natural language, we did *not* use a large
general-purpose upper ontology.  Instead, we used lexical
resources along the lines of WordNet, Roget's Thesaurus,
and VerbNet.  As you know, those resources contain very few
axioms or assumptions other than type-subtype and part-whole.    (024)

Slides #20 and #21 show the kinds of definitions used for
a domain-dependent ontology about arithmetic.  Only a dozen
or so conceptual graphs of the kind illustrated there were
sufficient as a supplement to the lexical resources.    (025)

Slides #22 and #23 discuss how the VivoMind software used
case-based reasoning to solve the problem.    (026)

Slides #24 to #27 discuss the legacy re-engineering problem,
which used a small domain-dependent ontology for analyzing
COBOL programs together with lexical resources along the
lines mentioned above.    (027)

Slides #24 and #25 describe the problem and the VivoMind approach.
Slide #26 shows a typical paragraph from the English documentation.
Note the following points:    (028)

  1. The English consists of some ordinary English words that are
     found in the lexical resources plus a lot of computer jargon
     and named entities that are found only in this domain.    (029)

  2. Interpreting such English without a detailed ontology would be
     impossible.  However, the first step (discussed in slide #25)
     used an off-the-shelf grammar for COBOL and a domain ontology
     to translate the COBOL to conceptual graphs.    (030)

  3. The domain ontology (written by Arun Majumdar) assumed one
     concept type for each COBOL syntactic type.  Arun defined
     additional concept and relation types to group the COBOL
     types in more general supertypes and some conceptual graphs
     to relate the COBOL types to English words (either from
     WordNet or from the jargon used in the domain).    (031)

  4. Arun translated the COBOL grammar to Prolog rules, which
     invoked the same VivoMind rules that generated CGs from
     English.  While parsing the COBOL, the parser made a list
     of all named entities (program names, file names, variable
     names, and named data items) and linked them to all graphs
     in which they were mentioned.    (032)

  5. Then the Intellitex parser used the conceptual graphs and
     named entities derived from COBOL to interpret the English,
     such as the example in slide #26.    (033)

Slide #27 shows the final results, which were exactly what the
customer wanted.  A major consulting firm estimated that the
task would require 80 person years to do by hand.  With the
VivoMind software, it took 15 person weeks plus 3 computer weeks.    (034)

Slides #28 to #29 describe the differences between the old
VivoMind software and the new VivoMind Language Processor (VLP),
which we are actively developing and extending.    (035)

Slides #30 to #38 describe the application to Oil and Gas
Exploration.  This application does the fourth kind of search
described above:  Search and inference across multiple texts
written in natural language and multiple databases that have
not been previously aligned.    (036)

Like the other examples, it uses lexical resources, not a
true ontology.  We have upgraded the resources in the past
few years, but most of the resources were downloaded for
free from the WWW.    (037)

For some of the resources we did some integration and
alignment.  But for others (such as Roget's Thesaurus) we
did not attempt to do any integration.  Instead, the agents
dynamically do whatever alignment seems appropriate during
the parsing.  For more information about that, see    (038)

    http://www.jfsowa.com/pubs/paradigm.pdf    (039)

The domain ontology was written by EGI (Earth and Geoscience
Institute) with some tutoring and consulting by Arun and me.
As a result of this work, we have developed some semi-automated
development aids that enable a domain expert with no knowledge
of any special knowledge representation language to write the
domain ontology:    (040)

  1. Analysis and extraction tools that find all the words in
     the source documents that are not already in the lexical
     resources or in any list of named entities.    (041)

  2. A tentative ontology that forms hypotheses about how the
     unknown terms are related to known terms and to one
     another.    (042)

  3. The domain expert can edit the tentative ontology to
     correct any errors and to add any additional concept
     or relation types.    (043)

  4. Steps #1, #2, and #3 can be iterated as many times as
     needed to improve the ontology.    (044)

  5. The domain expert(s) can used controlled English to
     write more detailed axioms needed for inferencing.    (045)

  6. The VivoMind software checks the axioms from #5 for
     consistency with the tentative ontology and with the
     other resources used for interpreting the English.    (046)

  7. Steps #1 to #6 can be reiterated with additional
     source documents until the domain experts are
     satisfied that the VLP system is interpreting the
     documents correctly.    (047)

We are still working on these tools to reduce the human effort
as much as possible.  Our goal is to enable the domain experts
to generate their own ontologies with a minimal amount of
tutorials and consulting from VivoMind.    (048)

This approach is working very well.  It's possible that more
general upper-level ontologies could be useful.  If so, the VLP
system could use them.  But we don't require any such ontology
to implement applications along the lines of the examples
presented in those slides.    (049)

John    (050)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (051)

<Prev in Thread] Current Thread [Next in Thread>