John, (01)
I think we are pretty much in agreement. We came at this from different
viewpoints, with different prejudices, and the clarifications have
revealed that we have reached many of the same conclusions, which may
only mean that we are both wrong on some points. (02)
Some clarifications: (03)
you wrote: (04)
> Note that computational complexity, compiler-based methods, and
> methodologies are very distinct issues. Restricting expressiveness
> is not a silver bullet that can hit the target on all three. I
> believe that a lot of money and R & D effort was wasted in trying. (05)
The lead-in to this discussion was "computational complexity", which was
what I was talking about. I don't think rolling these concerns up in
one blanket has been anyone's objective. Bounding computational
complexity is the big rationale for Description Logic and all of its
variants, and the key criterion for admissibility of extensions and
variants in the DAML->OWL program, and many related Semantic Web activities. (06)
The methodology part for OWL and friends is only beginning to emerge,
and I don't think it was a target of the DAML->OWL program. Methodology
development based on the "common upper ontology" approach is probably
the dominant form in FOL land, although I don't doubt there are others.
And I understand "compiler-based methods" to refer to logic
programming, a la Prolog, JESS and Jena, which is certainly knowledge
engineering, but entirely different from "ontology development" a la OWL
or CLIF in my view. (07)
So this is a very mixed bag, and we are talking past each other. (08)
> EB> Let's start with Horn clauses and rules engines and work our way
> > through description logics to FOL reasoners. It seems to me they
> > all restrict the KR language.
>
> Of course they do. But not for all three of the above reasons. (09)
I misunderstood what you were saying. The purpose of all of these
restrictions is to enable the reliable behavior of certain algorithms
for reasoning. That in turn guarantees that knowledge engineered in
that way can be used with known tooling to achieve practical results. (010)
> EB> Have you never heard knowledge engineers talk about
> > "avoiding disjunctions" or "adding helper axioms" or "steering
> > the reasoner"? Outside of DL land, I have never met an AI student
> > who doesn't understand those concepts, and the need for them.
>
> Anybody who is doing that is actually writing programs in an
> inadequate version of Prolog. I strongly approve of using Prolog
> as a very high-powered *programming* language, which happens to
> have some features of logic. (011)
Actually, the people who do that are building "ontologies" for Vampire
or Otter or some other FOL reasoner. And their mode of expression is
some FOL language in the CLIF/KIF family. (012)
The ontologies are intended to be "general-purpose", and the reasoners
may or may not contain "subproblem specialization" features. But when
tasked with getting some practical results from their ontology, the
engineers often find themselves confronting the problem of accidental
computational complexity. The algorithms used by the reasoner, coupled
with the ontology as written and the hypothesis proposed, sends the
reasoner off into the multi-hour result-free state that would be
"undecided" in cricket. But only for 2 of the 25 interesting test
hypotheses. (013)
The approaches I mentioned add a few additional lemmas or guidelines to
help the reasoner get over the hump, explicitly for the purpose of using
the ontology and the reasoner to solve the particular problems at hand.
And in the grand scheme of things, this is not really different from
the situation in which the knowledge engineer devises work-arounds to
get an OWL ontology and a DL reasoner to solve the problems. The OWL
engineer puts an ugly wart in the ontology, but knows that it guarantees
computability of the target results in the problem space. The KIF
engineer tinkers with the engine inputs until s/he gets computability in
tolerable time. (014)
And the good thing about the ugly wart is that you can see it and the
paper explains why. By comparison, the KIF engineer's paper rarely or
never mentions the tinkering, and when it finally works, s/he may not
know why. (015)
That was the counterpoint to the fact that the CLIF(-like) formulation
is straightforward and easier to read and understand. (016)
And all of this engineering is "programming", in the sense that it is
building a software machine to solve a set of problems. The software
machine comprises the engine, the control settings, and the ontology.
In this area, the difference between "ontology engineering" and "logic
programming" is that the "ontology" nominally captures knowledge in a
problem-independent way, and to the extent that that is true, the
ontology is reusable, while the "logic program" captures the relevant
knowledge in a problem-specific way, more or less "from the get-go".
But at the far end of the process, the "ontology engineer" does
"programming" processes to get problem solutions out of the
would-be-reusable ontology -- it has to be usable before it can be reusable. (017)
> EB> It is all about tuning the reasoning process to a class of
> > problem, after you realize that writing down the problem the way
> > you understood it did not produce acceptable performance from
> > the reasoner(s).
>
> Once you do that, you are no longer writing an ontology. You
> are writing a program. (018)
I agree completely. You are continuing the engineering process to
achieve its objective -- solving a given set of problems. (019)
> Modern DLs were introduced by Bill Woods in 1975 and Ron Brachman
> in 1979. That's 30 years of R & D. There are always finer and
> finer incremental improvements, but the basic points are clear. (020)
Yes, and by 1990, there were a dozen published extensions that enabled
solution of different kinds of problems. The main contribution of the
DAML research was to find maximal combinations of the published
extensions that produced in toto a still computationally bounded
algorithm. And there are useful extensions, or sets of extensions, that
were shown to be incompatible in that regard. And as you might guess,
the maximal sets included a few tweaks and constraints on the published
extensions they included. (021)
> One of the most significant points is that the issues of decidability
> and computational complexity, which are very important in theory, are
> a red herring when it comes to practice. The people working on Cyc
> have found that undecidable problems, which are theoretically possible
> with a rich language such as CycL, rarely, if ever, occur in practice. (022)
True. But what does occur in practice is the reasoner running for an
hour without a result, and there being no way to know whether it would
find a result in 6 more minutes, or 6 more days, or never. (023)
> The major difficulties they encountered:
>
> 1. Managing and organizing a very large knowledge base and finding
> the relevant axioms necessary to solve any specific problem.
>
> 2. Missing information that had not been anticipated by the kn.
> engineers who developed the KB.
>
> 3. Mismatches between the ontology as implemented and the raw data
> and problems that must be addressed.
>
> 4. Nonmonotonic issues of defaults, exceptions, and uncertainty. (024)
Welcome to knowledge engineering in the real world.
It is this kind of experience that soured many sponsors on knowledge
engineering 20 years ago. As ugly as it may be, the OWL effort is
making a lot more people aware of what can be done, and of course,
encountering problems like these. #1 is a big problem for medical
ontologies; #3 is a big problem for the intelligence community. (025)
> Those practical problems are vastly more significant than the
> hundreds of papers and dissertations that prove that a certain
> algorithm on a certain kind of problem is "tractable" -- that
> means solvable in polynomial time. Those problems #1 to #4 with
> Cyc are all "tractable". But polynomial time is not good enough. (026)
I don't disagree. The problem with problems 1-4 above is that they are
not theoretical problems. They are specific to the problem, the problem
space, and the problem environment. (027)
[Aside: As long as the criterion for getting the degree in engineering
is the quality of the mathematics rather than the complexity of the
actual problem it solves, we will have this divergence between theses
and value. But this is the accepted situation in most engineering
disciplines, as distinct from the sciences. The advanced degree
demonstrates that the engieer understands the theory of his/her
discipline, can commit to intensive research, and can devise something
new, however small. By comparison, a grunt engineer can solve a lot of
industrial problems without having any of those attributes.] (028)
> When Knuth was talking about "premature optimization", he meant
> that nobody can know in advance what optimization is needed.
> Current theorem provers are fairly good at "mature optimization":
>
> As an example, consider the WHERE-clause of an SQL query, which
> has the expressive power of full FOL. (029)
and the added simplification called 'negation as failure'. (030)
> That same WHERE-clause
> can be used with many different SELECT statements. Depending
> on which columns of which tables are being selected, very
> different optimizations are necessary. A premature choice of
> optimization can take orders of magnitude more time than the
> correct choice. (031)
I would say that is contrived. There is a lot of research in this area,
and it is all about optimizing performance of the whole query, taking
into account the projected elements as well as the selection criteria,
and the distribution of the visited elements over the (possibly
distributed) tables. (032)
The premature optimization in databases occurs in the design of
object-oriented databases, in which the information groups (objects) are
pre-linked according to a particular view of the problem space. This
makes it very easy to implement queries that fit that view, and very
complex and expensive to construct paths that collect the information
according to a different view. (033)
> Exactly the same issues are involved in a theorem prover.
> In effect, each axiom of an ontology is similar to a WHERE-
> clause of an SQL statement, but unlike an SQL query, the WHERE-
> clause does not have an attached SELECT-clause. That means it is
> *impossible* for a machine or even an educated human to determine
> in advance which optimization is appropriate. The correct choice
> can only be determined at the instant when the axiom is invoked. (034)
I am more than a bit suspicious of this analogy, but I won't argue. (035)
> I agree that there are many crufty things in Cyc, and many things
> I would do differently. But I don't want to get started in commenting
> on Cyc, which has gone through 24 years of evolution. (036)
But the whole knowledge engineering trade has gone through 40 years of
evolution. And among our tools we have amphibians and birds and
mammals, and each has its advantages and its weaknesses. After 40
years, they are all good at what they do well, and less weak in other
areas than their ancestors. But there are enough problems to feed all
of CLIF and OWL and Prolog and Jena, and for the most part, the issues
we have discussed are those that define the food groups. (037)
> EB> It is all about tuning the reasoning process to a class of
> > problem, after you realize that writing down the problem the way
> > you understood it did not produce acceptable performance from
> > the reasoner(s).
>
> I would not call that writing an ontology. I would call it programming, (038)
I call the whole process "knowledge engineering" -- analyzing the
problem space and the problems given, choosing the class of engine,
writing the ontology, debugging it, and then tweaking the ontology and
the engine to solve the problems that were originally posed. You can
call any part of that process "programming" if you like, but the idea of
crafting and modifying the knowledge representation to solve the target
problems is not restricted to logic programming. The very description
of the Cyc problems above supports that. (039)
If you love Platonic upper ontologies and the capture of universal
truth, don't let me rain on your parade. But if your objective is to
develop an ontology for marking up a set of unknown but closely related
documents in a given field, you have a well-defined problem space, and
you are doing knowledge engineering -- the whole process above. And if
you are building an ontology to enable particular kinds of communication
among a certain set of "business partners", you have a well-defined
problem space, and your are doing the whole process above. (040)
> and I would recommend ISO standard Prolog as a far better language for
> such work than OWL, RuleML, or even CLIF or CGIF. (041)
We are talking past one another again. Logic programming is only one of
several choices of "class of engine" for a knowledge engineering
problem. Whether it is the right choice depends on what the problem is. (042)
-Ed (043)
--
Edward J. Barkmeyer Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263 Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263 FAX: +1 301-975-4694 (044)
"The opinions expressed above do not reflect consensus of NIST,
and have not been reviewed by any Government authority." (045)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (046)
|