ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Difference between XML and OWL

To: "John F. Sowa" <sowa@xxxxxxxxxxx>
Cc: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Ed Barkmeyer <edbark@xxxxxxxx>
Date: Thu, 23 Oct 2008 14:19:51 -0400
Message-id: <4900C047.20503@xxxxxxxx>
John,    (01)

I think we are pretty much in agreement.  We came at this from different 
viewpoints, with different prejudices, and the clarifications have 
revealed that we have reached many of the same conclusions, which may 
only mean that we are both wrong on some points.    (02)

Some clarifications:    (03)

you wrote:    (04)

> Note that computational complexity, compiler-based methods, and
> methodologies are very distinct issues.  Restricting expressiveness
> is not a silver bullet that can hit the target on all three.  I
> believe that a lot of money and R & D effort was wasted in trying.    (05)

The lead-in to this discussion was "computational complexity", which was 
what I was talking about.  I don't think rolling these concerns up in 
one blanket has been anyone's objective.  Bounding computational 
complexity is the big rationale for Description Logic and all of its 
variants, and the key criterion for admissibility of extensions and 
variants in the DAML->OWL program, and many related Semantic Web activities.    (06)

The methodology part for OWL and friends is only beginning to emerge, 
and I don't think it was a target of the DAML->OWL program.  Methodology 
development based on the "common upper ontology" approach is probably 
the dominant form in FOL land, although I don't doubt there are others. 
  And I understand "compiler-based methods" to refer to logic 
programming, a la Prolog, JESS and Jena, which is certainly knowledge 
engineering, but entirely different from "ontology development" a la OWL 
or CLIF in my view.    (07)

So this is a very mixed bag, and we are talking past each other.    (08)

> EB> Let's start with Horn clauses and rules engines and work our way
>  > through description logics to FOL reasoners.  It seems to me they
>  > all restrict the KR language.
> 
> Of course they do.  But not for all three of the above reasons.    (09)

I misunderstood what you were saying.  The purpose of all of these 
restrictions is to enable the reliable behavior of certain algorithms 
for reasoning.  That in turn guarantees that knowledge engineered in 
that way can be used with known tooling to achieve practical results.    (010)

> EB>  Have you never heard knowledge engineers talk about
>  > "avoiding disjunctions" or "adding helper axioms" or "steering
>  > the reasoner"?  Outside of DL land, I have never met an AI student
>  > who doesn't understand those concepts, and the need for them.
> 
> Anybody who is doing that is actually writing programs in an
> inadequate version of Prolog.  I strongly approve of using Prolog
> as a very high-powered *programming* language, which happens to
> have some features of logic.    (011)

Actually, the people who do that are building "ontologies" for Vampire 
or Otter or some other FOL reasoner.  And their mode of expression is 
some FOL language in the CLIF/KIF family.    (012)

The ontologies are intended to be "general-purpose", and the reasoners 
may or may not contain "subproblem specialization" features.  But when 
tasked with getting some practical results from their ontology, the 
engineers often find themselves confronting the problem of accidental 
computational complexity.  The algorithms used by the reasoner, coupled 
with the ontology as written and the hypothesis proposed, sends the 
reasoner off into the multi-hour result-free state that would be 
"undecided" in cricket.  But only for 2 of the 25 interesting test 
hypotheses.    (013)

The approaches I mentioned add a few additional lemmas or guidelines to 
help the reasoner get over the hump, explicitly for the purpose of using 
the ontology and the reasoner to solve the particular problems at hand. 
  And in the grand scheme of things, this is not really different from 
the situation in which the knowledge engineer devises work-arounds to 
get an OWL ontology and a DL reasoner to solve the problems.  The OWL 
engineer puts an ugly wart in the ontology, but knows that it guarantees 
computability of the target results in the problem space.  The KIF 
engineer tinkers with the engine inputs until s/he gets computability in 
tolerable time.    (014)

And the good thing about the ugly wart is that you can see it and the 
paper explains why.  By comparison, the KIF engineer's paper rarely or 
never mentions the tinkering, and when it finally works, s/he may not 
know why.    (015)

That was the counterpoint to the fact that the CLIF(-like) formulation 
is straightforward and easier to read and understand.    (016)

And all of this engineering is "programming", in the sense that it is 
building a software machine to solve a set of problems.  The software 
machine comprises the engine, the control settings, and the ontology. 
In this area, the difference between "ontology engineering" and "logic 
programming" is that the "ontology" nominally captures knowledge in a 
problem-independent way, and to the extent that that is true, the 
ontology is reusable, while the "logic program" captures the relevant 
knowledge in a problem-specific way, more or less "from the get-go". 
But at the far end of the process, the "ontology engineer" does 
"programming" processes to get problem solutions out of the 
would-be-reusable ontology -- it has to be usable before it can be reusable.    (017)

> EB> It is all about tuning the reasoning process to a class of
>  > problem, after you realize that writing down the problem the way
>  > you understood it did not produce acceptable performance from
>  > the reasoner(s).
> 
> Once you do that, you are no longer writing an ontology.  You
> are writing a program.      (018)

I agree completely.  You are continuing the engineering process to 
achieve its objective -- solving a given set of problems.    (019)

> Modern DLs were introduced by Bill Woods in 1975 and Ron Brachman
> in 1979.  That's 30 years of R & D.  There are always finer and
> finer incremental improvements, but the basic points are clear.    (020)

Yes, and by 1990, there were a dozen published extensions that enabled 
solution of different kinds of problems.  The main contribution of the 
DAML research was to find maximal combinations of the published 
extensions that produced in toto a still computationally bounded 
algorithm.  And there are useful extensions, or sets of extensions, that 
were shown to be incompatible in that regard.  And as you might guess, 
the maximal sets included a few tweaks and constraints on the published 
extensions they included.    (021)

> One of the most significant points is that the issues of decidability
> and computational complexity, which are very important in theory, are
> a red herring when it comes to practice.  The people working on Cyc
> have found that undecidable problems, which are theoretically possible
> with a rich language such as CycL, rarely, if ever, occur in practice.    (022)

True.  But what does occur in practice is the reasoner running for an 
hour without a result, and there being no way to know whether it would 
find a result in 6 more minutes, or 6 more days, or never.    (023)

> The major difficulties they encountered:
> 
>  1. Managing and organizing a very large knowledge base and finding
>     the relevant axioms necessary to solve any specific problem.
> 
>  2. Missing information that had not been anticipated by the kn.
>     engineers who developed the KB.
> 
>  3. Mismatches between the ontology as implemented and the raw data
>     and problems that must be addressed.
> 
>  4. Nonmonotonic issues of defaults, exceptions, and uncertainty.    (024)

Welcome to knowledge engineering in the real world.
It is this kind of experience that soured many sponsors on knowledge 
engineering 20 years ago.  As ugly as it may be, the OWL effort is 
making a lot more people aware of what can be done, and of course, 
encountering problems like these.  #1 is a big problem for medical 
ontologies; #3 is a big problem for the intelligence community.    (025)

> Those practical problems are vastly more significant than the
> hundreds of papers and dissertations that prove that a certain
> algorithm on a certain kind of problem is "tractable" -- that
> means solvable in polynomial time.  Those problems #1 to #4 with
> Cyc are all "tractable".  But polynomial time is not good enough.    (026)

I don't disagree.  The problem with problems 1-4 above is that they are 
not theoretical problems.  They are specific to the problem, the problem 
space, and the problem environment.    (027)

[Aside: As long as the criterion for getting the degree in engineering 
is the quality of the mathematics rather than the complexity of the 
actual problem it solves, we will have this divergence between theses 
and value.  But this is the accepted situation in most engineering 
disciplines, as distinct from the sciences.  The advanced degree 
demonstrates that the engieer understands the theory of his/her 
discipline, can commit to intensive research, and can devise something 
new, however small.  By comparison, a grunt engineer can solve a lot of 
industrial problems without having any of those attributes.]    (028)

> When Knuth was talking about "premature optimization", he meant
> that nobody can know in advance what optimization is needed.
> Current theorem provers are fairly good at "mature optimization":
> 
> As an example, consider the WHERE-clause of an SQL query, which
> has the expressive power of full FOL.      (029)

and the added simplification called 'negation as failure'.    (030)

> That same WHERE-clause
> can be used with many different SELECT statements.  Depending
> on which columns of which tables are being selected, very
> different optimizations are necessary.  A premature choice of
> optimization can take orders of magnitude more time than the
> correct choice.    (031)

I would say that is contrived.  There is a lot of research in this area, 
and it is all about optimizing performance of the whole query, taking 
into account the projected elements as well as the selection criteria, 
and the distribution of the visited elements over the (possibly 
distributed) tables.    (032)

The premature optimization in databases occurs in the design of 
object-oriented databases, in which the information groups (objects) are 
pre-linked according to a particular view of the problem space.  This 
makes it very easy to implement queries that fit that view, and very 
complex and expensive to construct paths that collect the information 
according to a different view.    (033)

> Exactly the same issues are involved in a theorem prover.
> In effect, each axiom of an ontology is similar to a WHERE-
> clause of an SQL statement, but unlike an SQL query, the WHERE-
> clause does not have an attached SELECT-clause.  That means it is
> *impossible* for a machine or even an educated human to determine
> in advance which optimization is appropriate.  The correct choice
> can only be determined at the instant when the axiom is invoked.    (034)

I am more than a bit suspicious of this analogy, but I won't argue.    (035)

> I agree that there are many crufty things in Cyc, and many things
> I would do differently.  But I don't want to get started in commenting
> on Cyc, which has gone through 24 years of evolution.    (036)

But the whole knowledge engineering trade has gone through 40 years of 
evolution.  And among our tools we have amphibians and birds and 
mammals, and each has its advantages and its weaknesses.  After 40 
years, they are all good at what they do well, and less weak in other 
areas than their ancestors.  But there are enough problems to feed all 
of CLIF and OWL and Prolog and Jena, and for the most part, the issues 
we have discussed are those that define the food groups.    (037)

> EB> It is all about tuning the reasoning process to a class of
>  > problem, after you realize that writing down the problem the way
>  > you understood it did not produce acceptable performance from
>  > the reasoner(s).
> 
> I would not call that writing an ontology.  I would call it programming,    (038)

I call the whole process "knowledge engineering" -- analyzing the 
problem space and the problems given, choosing the class of engine, 
writing the ontology, debugging it, and then tweaking the ontology and 
the engine to solve the problems that were originally posed.  You can 
call any part of that process "programming" if you like, but the idea of 
crafting and modifying the knowledge representation to solve the target 
problems is not restricted to logic programming.  The very description 
of the Cyc problems above supports that.    (039)

If you love Platonic upper ontologies and the capture of universal 
truth, don't let me rain on your parade.  But if your objective is to 
develop an ontology for marking up a set of unknown but closely related 
documents in a given field, you have a well-defined problem space, and 
you are doing knowledge engineering -- the whole process above.  And if 
you are building an ontology to enable particular kinds of communication 
among a certain set of "business partners", you have a well-defined 
problem space, and your are doing the whole process above.    (040)

> and I would recommend ISO standard Prolog as a far better language for
> such work than OWL, RuleML, or even CLIF or CGIF.    (041)

We are talking past one another again.  Logic programming is only one of 
several choices of "class of engine" for a knowledge engineering 
problem.  Whether it is the right choice depends on what the problem is.    (042)

-Ed    (043)

-- 
Edward J. Barkmeyer                        Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                FAX: +1 301-975-4694    (044)

"The opinions expressed above do not reflect consensus of NIST,
  and have not been reviewed by any Government authority."    (045)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (046)

<Prev in Thread] Current Thread [Next in Thread>