Neil,
A brief reply on one point you raised:
[NC] I've thought for a while that having instance values
included in the structure of an ontology seemed an inefficient way to put an
effective knowledge representation system together. In a somewhat
simplified example, when using a relational database, one doesn't look at the
database schema at the same time they look information filling the rows in
tables. The only time they need to know the names of those tables and
columns is when they are building queries, views, and reports.
Although
the data of interest in applications need not be directly part of an ontology, I
believe that it is useful to include some well-known instances of things as
examples of types, so as to make the meanings of terms clear. This
is easiest with physical objects and historical events. Instances are also
useful as a means to test the validity of the ontology structure, since some
consistency tests do not show errors until instances are described with the
relations.
Pat
Patrick Cassidy
MICRA, Inc.
908-561-3416
cell: 908-565-4053
cassidy@xxxxxxxxx
From:
ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Neil Custer
Sent: Thursday, August 28, 2008 10:11 AM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Foundation Ontology
Thanks for the reply John.
You certainly raise valid and interesting points and I have no doubt that
Google developers know what they are doing. I also certainly understand
that any compiled data is going to be more compact in a binary format and as a
result, much quicker to process as well. I will not argue that too many
developers try to use XML as a "silver bullet"--there are certainly
many, many applications that either process large volumes of data or are very
time critical. In these cases the size of the data packages certainly
must be clearly defined and compiled to be optimized for pure processing speed
such as your colleague's telecom app.
A few additional thoughts:
Google developers are not building a Foundation Ontology on which they expect
the rest of the knowledge representation world to base their work (at least
that we know of at this time). If they are, perhaps they could share
their insight with us...
Given the similarities in the two formats as posed in your original post, one
could fairly easily build a conversion application that applies the rules for
constructing .proto files from an .owl or .rdf file. Perhaps the central
issue is the separation of the information from the structure. This would
(in my limited experience of working with ontologies) be the equivalent of
separating the instances from the knowledge representation structure.
However, my limited experience with knowledge bases may be the source of my
confusion here.
I've thought for a while that having instance values included in the structure
of an ontology seemed an inefficient way to put an effective knowledge
representation system together. In a somewhat simplified example, when
using a relational database, one doesn't look at the database schema at the
same time they look information filling the rows in tables. The only time
they need to know the names of those tables and columns is when they are
building queries, views, and reports.
I wholeheartedly agree with you that "the data descriptors should be
logically accessible when needed, [AND] independent of their physical
location." With the value of the knowledge representation structure
at the heart of an ontology, the representation of a concept in the ontology
needs to be clear to the user of the structure when a specific application of
that knowledge is being tapped. That being the case, there should be a
DOI or some other persistent identifier that a knowledge application builder
can refer a concept to without having to pull an entire ontological structure
(or even parts of it, necessarily) into their application.
On Thu, Aug 28, 2008 at 1:33 AM, John F. Sowa <sowa@xxxxxxxxxxx> wrote:
Neil,
I would give the Google developers credit for knowing what they are
doing, especially in version 2.0 of a format that they use for very
large volumes of data. As they say, XML is good for data embedded
in text documents. Otherwise, there are better choices.
> The site states: "this message is encoded to the protocol
buffer
> binary format (the text format above is just a convenient
> human-readable representation for debugging and editing)".
That merely means that the source format is the one that
should
be used for development (i.e., debugging, editing, and sharing).
> This means that no one other than a programmer will ever be able
> to see that "more readable" form unless the developer goes
to the
> trouble to expose it in their application in that
format.
That is true of any notation that is compiled for further processing.
Developers use the source form. For any application, no one other
than a programmer ever sees anything except the results that the
program generates.
> This format does not express the structure with the message content,
> one of the greatest strengths of XML and the languages based on it
> such as RDF and OWL.
The data descriptors should be logically accessible when
needed,
independent of their physical location. Following is an excerpt from
a note I sent to a different forum in which similar issues were raised.
John
______________________________________________________________________
> A general return to non-self-documenting formats popular
> when processors were much more limited than they are today
> seems like a step in the wrong direction.
The crucial point is *accessible*. The supporting tools must make
the descriptors accessible whenever they're needed, but that does
not imply that they must be physically stored and duplicated with
every data instance.
Furthermore, the processors have never become so fast that it's
OK to be wasteful. If you're processing thousands of data elements,
the time may be negligible compared to human response time. But if
you have to make a decision to switch data streams in milliseconds
or microseconds, many applications cannot afford to use XML.
If you are storing and processing petabytes or more of data, a factor
of 2 or 10 can make a difference of multimillions of dollars in
hardware cost. Google and many other companies know that.
But if you really need to keep the descriptors physically close, then
there is an alternative that is more expressive than RDF(S) or OWL
and about a factor of 10 more compact. It's also an ISO standard:
Common Logic in either the CGIF or CLIF notations.
My colleague Arun Majumdar demonstrated that point on a project as
a consultant for a major telecom. They had tried to use XML for data
encoding in a task that required switching data streams in millisecond
times. But they couldn't respond within the time limits. Arun
replaced the XML encoding with conceptual graphs and met the time
constraints. The CG version went into production.
John
--
Neil
|
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (01)
|