Comments in-line.
Patrick Cassidy
MICRA, Inc.
908-561-3416
cell: 908-565-4053
cassidy@xxxxxxxxx
From:
ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of Neil Custer
Sent: Thursday, August 28, 2008 2:59 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Foundation Ontology
Examples in the metadata of a
database schema can be helpful as well. This is why schemas should always
be well-annotated. Thus, if defining a month field in a database and
named it Accrual_Month with a format of CHAR(3), I would also put examples such
as "Jan", "Sep", or "Oct" but not all caps such
as "NOV" or "DEC".
[[PC]] In an ontology
there will be very few types whose instances consist of a mere enumerations of
strings. January, February, etc. are themselves types of recurring time
interval, and have some well-defined properties that should be specified –
i.e. they are first-class entities, not strings. Thinking of an ontology
as just another form of database schema is likely to lead to inadequate
semantics for applications that need something more than a traditional database.
However, you've confused me as to how examples are usefule to test the validity
of the ontology structure. I though consistency checks were to see if
instances were possible based on the structure. If what you are alluding
is the "human" check, then example instances can be entered during
that validity check performed during construction and/or maintenance of the
ontology, correct?
[[PC]] One form of
consistency check does look at “disjoint” statements to verify that
any instance would be inconsistent, where some type is a subtype of
disjoint types. But there are other kinds of consistency check that pick
up errors in various kinds of usage, including some subtle ones that propagate
down an inheritance chain from constraints at a high-level type.
Instances are useful to pick up such subtle issues. As a practical
matter, having example of how instances are described can make using an ontology
easier to use by providing templates that can be copied and modified for
instances that are close in meaning to existing types.
Also, let's say you would want to retain some sample instances in the ontology
-- does that have to mean that all instances were directed stored as
part of the ontology?
[[PC]] No, I specifically
mentioned *examples*, and in my first sentence stated that the application data
need not be part of the ontology. In the ontology systems I have worked
with there is no need to differentiate the instances from the type structure of
the ontology – in fact, they types themselves are instances of metatypes,
and both types and instances are smoothly handled by the implementation.
But if one wants to use an ontology for multiple specialized and separate
applications, it is simple enough to have the instances in each application
defined in a separate file that serves as an extension to the ontology of
types. One could also have all example instances (created for the
purposes I mentioned) in an extension, so they don’t *have to* be part of
the “ontology”, if you want to differentiate the T-box and A-box.
But the point I am making is that the ontology should be tested with a number
of instances, and having instances that are well-known entities from the real
world helps to “ground” the ontology in its intended external referents
so that it is not merely some closed set of symbols that could mean anything at
all, or nothing. That point is to make the ontology easier to
understand for the human ontologist.
This is wildly different than
data in a database or information encoded in an XML document that is built
according to a schema. In both cases, the database schema and the XML
schema, these are the intension of the data to be contained in the construct,
whereas the rows in tables and information in the XML document are the
extension. Why cannot this same concept apply to ontology construction?
[[PC]] As I said, if you feel that you must, you could structure it that way,
but if you use metatypes that have instances that are types of things in your
database, it will not be used quite like a traditional database. Some of
the table names will wind up as “data” in some of the tables.
==========================
On Thu, Aug 28, 2008 at 10:16 AM, Patrick Cassidy <pat@xxxxxxxxx> wrote:
Neil,
A brief reply on
one point you raised:
[NC] I've thought for a while that having instance values included in the
structure of an ontology seemed an inefficient way to put an effective
knowledge representation system together. In a somewhat simplified
example, when using a relational database, one doesn't look at the database
schema at the same time they look information filling the rows in tables.
The only time they need to know the names of those tables and columns is when
they are building queries, views, and reports.
Although the data of interest in
applications need not be directly part of an ontology, I believe that it is
useful to include some well-known instances of things as examples of types, so
as to make the meanings of terms clear. This is easiest with
physical objects and historical events. Instances are also useful
as a means to test the validity of the ontology structure, since some
consistency tests do not show errors until instances are described with the
relations.
Pat
Patrick Cassidy
MICRA, Inc.
908-561-3416
cell: 908-565-4053
cassidy@xxxxxxxxx
Thanks for the reply John.
You certainly raise valid and interesting points and I have no doubt that
Google developers know what they are doing. I also certainly understand
that any compiled data is going to be more compact in a binary format and as a
result, much quicker to process as well. I will not argue that too many
developers try to use XML as a "silver bullet"--there are certainly
many, many applications that either process large volumes of data or are very
time critical. In these cases the size of the data packages certainly
must be clearly defined and compiled to be optimized for pure processing speed
such as your colleague's telecom app.
A few additional thoughts:
Google developers are not building a Foundation Ontology on which they expect
the rest of the knowledge representation world to base their work (at least
that we know of at this time). If they are, perhaps they could share
their insight with us...
Given the similarities in the two formats as posed in your original post, one
could fairly easily build a conversion application that applies the rules for
constructing .proto files from an .owl or .rdf file. Perhaps the central
issue is the separation of the information from the structure. This would
(in my limited experience of working with ontologies) be the equivalent of
separating the instances from the knowledge representation structure.
However, my limited experience with knowledge bases may be the source of my
confusion here.
I've thought for a while that having instance values included in the structure
of an ontology seemed an inefficient way to put an effective knowledge
representation system together. In a somewhat simplified example, when
using a relational database, one doesn't look at the database schema at the
same time they look information filling the rows in tables. The only time
they need to know the names of those tables and columns is when they are
building queries, views, and reports.
I wholeheartedly agree with you that "the data descriptors should be
logically accessible when needed, [AND] independent of their physical
location." With the value of the knowledge representation structure
at the heart of an ontology, the representation of a concept in the ontology
needs to be clear to the user of the structure when a specific application of
that knowledge is being tapped. That being the case, there should be a DOI
or some other persistent identifier that a knowledge application builder can
refer a concept to without having to pull an entire ontological structure (or
even parts of it, necessarily) into their application.
On Thu, Aug 28, 2008 at 1:33 AM, John F. Sowa <sowa@xxxxxxxxxxx> wrote:
Neil,
I would give the Google developers credit for knowing what they are
doing, especially in version 2.0 of a format that they use for very
large volumes of data. As they say, XML is good for data embedded
in text documents. Otherwise, there are better choices.
> The site states: "this message is encoded to the protocol
buffer
> binary format (the text format above is just a convenient
> human-readable representation for debugging and editing)".
That merely means that the source format is the one that should
be used for development (i.e., debugging, editing, and sharing).
> This means that no one other than a programmer will ever be able
> to see that "more readable" form unless the developer goes
to the
> trouble to expose it in their application in that format.
That is true of any notation that is compiled for further processing.
Developers use the source form. For any application, no one other
than a programmer ever sees anything except the results that the
program generates.
> This format does not express the structure with the message content,
> one of the greatest strengths of XML and the languages based on it
> such as RDF and OWL.
The data descriptors should be logically accessible when needed,
independent of their physical location. Following is an excerpt from
a note I sent to a different forum in which similar issues were raised.
John
______________________________________________________________________
> A general return to non-self-documenting formats popular
> when processors were much more limited than they are today
> seems like a step in the wrong direction.
The crucial point is *accessible*. The supporting tools must make
the descriptors accessible whenever they're needed, but that does
not imply that they must be physically stored and duplicated with
every data instance.
Furthermore, the processors have never become so fast that it's
OK to be wasteful. If you're processing thousands of data elements,
the time may be negligible compared to human response time. But if
you have to make a decision to switch data streams in milliseconds
or microseconds, many applications cannot afford to use XML.
If you are storing and processing petabytes or more of data, a factor
of 2 or 10 can make a difference of multimillions of dollars in
hardware cost. Google and many other companies know that.
But if you really need to keep the descriptors physically close, then
there is an alternative that is more expressive than RDF(S) or OWL
and about a factor of 10 more compact. It's also an ISO standard:
Common Logic in either the CGIF or CLIF notations.
My colleague Arun Majumdar demonstrated that point on a project as
a consultant for a major telecom. They had tried to use XML for data
encoding in a task that required switching data streams in millisecond
times. But they couldn't respond within the time limits. Arun
replaced the XML encoding with conceptual graphs and met the time
constraints. The CG version went into production.
John
--
Neil
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
--
Neil
Life is Great!-- It sure beats the alternative!!
So live better - live longer! See our exclusive health and nutrition products
at:
http://ncuster.qhealthzone.com
|
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (01)
|