[Top] [All Lists]

Re: [ontolog-forum] Integrating Semantic Systems

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Fri, 11 Jun 2010 09:13:04 -0400
Message-id: <4C123660.1070508@xxxxxxxxxxx>
First, I'd like to thank everybody who made comments on my slides,
either online or offline.  They helped me clarify and sharpen many
of the points.  Unfortunately I can't go into more detail, since
it's only a 3-hour tutorial, and I've set 100 as an upper limit on
the number of slides.  For the record, here's the latest version:    (01)

    http://www.jfsowa.com/talks/iss.pdf    (02)

But there are several issues that deserve more attention, and I
can't address them adequately in the slides.  Following are three:    (03)

  1. Why am I so negative about the Semantic Web?    (04)

  2. Why do I put so much emphasis on controlled natural languages,
     since they're just "syntactic sugar"?    (05)

  3. Why don't I address the issues of logics with different
     levels of expressive power and the methods of mapping
     between them?    (06)

On the first point, Dan Brickley dug out an email note I wrote
in 1998 that was very hopeful about the Semantic Web.  In fact,
that's one of the main reasons why I often make negative remarks:
it hasn't lived up to its promise.  He also asked me to make a
concrete proposal of what to do about it, and the iss.pdf slides
cover a large part of what I would like to say.    (07)

To be fair, there is probably nobody specific who can be blamed.
Tim Berners-Lee had a grand vision, which turned out to be rather
less grand, but the fault is with the organization:  Committees
are very good at evaluating proposals, but they are terrible at
producing a unified, elegant design.    (08)

Steve Jobs, for example, leads Apple to produce great designs.
That is partly because he has good taste, partly because he also
holds the purse strings, but mostly because he takes a direct,
hands-on role in the final design.  Sony, for example, has far
more experience in consumer electronics, and they produced great
designs when their founder was in control.  But their current
committees can't compete with Steve.    (09)

I have also said that the single worst feature of the Semantic Web
is the word 'web'.  The WWW certainly raises many practical problems,
but there is not a single semantic issue that is unique to the WWW.
Unfortunately, the only design document for the SW was that layer
cake, which puts syntax at the foundation.  See Slide 21.    (010)

On the left of Slide 21 is the original layer cake, and a more
recent layer cake is on the right.  A sign that something has gone
wrong is that the green layer labeled 'Logic' has shrunk to a tiny
green box labeled 'Unifying Logic'.  Furthermore, that box sits on
top of independently developed components, each with a different
semantics that has its own model theory (or worse, none at all).
That's a multi-humped camel that only a committee could produce.    (011)

Even worse, the number of new humps is proliferating.  I'll say
more about that in connection with point #3.    (012)

On point #2, Alan Perlis stated one of his famous epigrams:    (013)

    Syntactic sugar causes cancer of the semicolon.    (014)

For some kinds of languages, there is some truth to that.  But there's
another widely quoted epigram by that famous author, Anonymous:    (015)

    C is assembly language with semicolons.    (016)

There's some truth to the claim that C is syntactic sugar for assembly
language.  But no one seriously recommends that we go back to the bad
old days when operating systems were written in assembly language.    (017)

For mathematics, '2+2=4' is better than 'Two plus two equals four.'
But note that mathematical notation did not arise as a *replacement*
for natural languages, but as a system of abbreviations for aspects
of NLs that eventually took on a life of its own.  But mathematicians
still intersperse fragments of their notations in their speech and
writing.    (018)

Furthermore, when the symbols come thick and heavy, even mathematicians
skip the symbols and read the NL commentary before going back to read
the details in the symbols.  Peano once started a mathematical journal
that used *only* symbolic logic for the commentary.  It shut down after
printing the first three issues (all of which were written by Peano
or his students).    (019)

For an example of why controlled NLs are important, see the bottom half
of Slide 19.  (I would have put that on a separate slide, but I didn't
want to go over 100.)  That example, suggested by Sjir Nijssen, shows
how they use controlled English:    (020)

  1. Each table in an relational database (or type of ground-level
     clause in logic or each type of triple in RDF) is described by
     one or more *fact types* which are patterns stated in controlled
     English.    (021)

  2. Each fact in a database or network or logic instantiates the
     entity types in a fact type with names or other identifiers.    (022)

  3. Each clause in a constraint or rule or axiom consists of a
     fact type with quantifiers placed in front of the entity types.    (023)

  4. Clauses can be combined with Boolean operators (e.g., if-then
     for rules) to form controlled NL statements of arbitrary
     complexity.  (See slide 31 for FLIPP diagrams, which could be
     used to represent Boolean combinations of fact types.)    (024)

  5. These fact types form the basis for translating controlled NLs
     to and from whatever formal notation or database organization
     (table and/or network and/or predicate calculus) is used.    (025)

  6. By defining equivalent fact types in different languages,
     it is possible to "verbalize" the same database or knowledge
     base or ontology in different languages for different readers.    (026)

  7. This approach (or variants of it) can be used at every stage
     from design to development to help and explanation tools.    (027)

I won't claim that controlled NLs are a panacea, but they can
make a very important contribution to ontology design and use.    (028)

Regarding point #3, I complained about the word 'approximation'
in the following passage from a web site about TrOWL:    (029)

 > TrOWL utilises a semantic approximation to transform OWL2-DL
 > ontologies into OWL2-QL for conjunctive query answering and
 > a syntactic approximation from OWL2-DL to OWL2-EL for TBox
 > and ABox reasoning.    (030)

The authors explained that they used the word 'approximation' to
describe the lossy mapping from a more expressive language to a
less expressive language.  But I have two kinds of complaints:
first, the word 'approximation' is misleading; second, all those
variations of notations are more humps on an overloaded camel.    (031)

Consider the controlled English methodology above.  The users
(ranging from subject-matter experts to software implementers
to end users) see controlled English as their primary interface.
Fact types are the primary patterns for thinking about everything
from the ontology to the constraints, the rules, the queries,
and the help, diagnostic, and explanation facilities.    (032)

As another example, consider the family of UML diagrams (Slide 32).
Each of those diagrams uses a different subset of logic, and many
of them also add a different chunk of ontology to that logic.
But nobody talks about "approximations" in mapping one diagram
to another.  Instead, each one of them represents a different
aspect or view of the same system.  The UML family began as
informal notations, but they are now being defined in terms
of Common Logic, as the unifying framework.    (033)

As a third example, which demonstrates how a multiplicity
of different subsets can be implemented, I'd mention Cyc
(which is discussed in slides 13 to 15).  They have a single,
very expressive logic written in the Cyc Language (CycL).
But they have several dozen different inference engines and
related tools, which are specialized for different subsets
that use different reasoning methods.    (034)

The knowledge engineers and other users don't know or care which
subset they're using.  The system determines *dynamically* which
inference engine or other processor to invoke for any particular
problem.  Nobody talks about "approximating" the knowledge. They
just use one expressive language, and the system automatically
determines which subset or inference engine to use as needed.    (035)

That very general method had some performance issues, which have
been resolved over the years.  In addition to the dynamic methods
used in Cyc, there are also offline methods for extracting parts
of the knowledge base from Cyc and statically mapping the it
to other systems.  (See the URL at the bottom of slide 15.)
That method does not "approximate" anything, and it does not
"lose" anything.  It just maps it in different ways for different
processing units.    (036)

For further discussion of issues about expressive power (and the
methods cited in slide 15), see pp. 4 to 6 of the following:    (037)

    Fads and Fallacies about Logic    (038)

This paper, by the way, was published in a journal of which Jim Hendler
was the editor.  Jim has been a very strong proponent of the Semantic
Web, and we have had some disagreements about it.  But Jim liked the
paper and agreed with the basic points.    (039)

John Sowa    (040)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (041)

<Prev in Thread] Current Thread [Next in Thread>