ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Foundation Ontology

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Neil Custer" <neil.custer@xxxxxxxxx>
Date: Thu, 28 Aug 2008 09:10:30 -0500
Message-id: <7a5bda850808280710o47a176d3k9e2ddcf160c5ba2e@xxxxxxxxxxxxxx>
Thanks for the reply John.

You certainly raise valid and interesting points and I have no doubt that Google developers know what they are doing.  I also certainly understand that any compiled data is going to be more compact in a binary format and as a result, much quicker to process as well. I will not argue that too many developers try to use XML as a "silver bullet"--there are certainly many, many applications that either process large volumes of data or are very time critical.  In these cases the size of the data packages certainly must be clearly defined and compiled to be optimized for pure processing speed such as your colleague's telecom app.

A few additional thoughts:

Google developers are not building a Foundation Ontology on which they expect the rest of the knowledge representation world to base their work (at least that we know of at this time).  If they are, perhaps they could share their insight with us... 

Given the similarities in the two formats as posed in your original post, one could fairly easily build a conversion application that applies the rules for constructing .proto files from an .owl or .rdf file.  Perhaps the central issue is the separation of the information from the structure.  This would (in my limited experience of working with ontologies) be the equivalent of separating the instances from the knowledge representation structure.  However, my limited experience with knowledge bases may be the source of my confusion here.

I've thought for a while that having instance values included in the structure of an ontology seemed an inefficient way to put an effective knowledge representation system together.  In a somewhat simplified example, when using a relational database, one doesn't look at the database schema at the same time they look information filling the rows in tables.  The only time they need to know the names of those tables and columns is when they are building queries, views, and reports.

I wholeheartedly agree with you that "the data descriptors should be logically accessible when needed, [AND] independent of their physical location."  With the value of the knowledge representation structure at the heart of an ontology, the representation of a concept in the ontology needs to be clear to the user of the structure when a specific application of that knowledge is being tapped.  That being the case, there should be a DOI or some other persistent identifier that a knowledge application builder can refer a concept to without having to pull an entire ontological structure (or even parts of it, necessarily) into their application.



On Thu, Aug 28, 2008 at 1:33 AM, John F. Sowa <sowa@xxxxxxxxxxx> wrote:
Neil,

I would give the Google developers credit for knowing what they are
doing, especially in version 2.0 of a format that they use for very
large volumes of data.  As they say, XML is good for data embedded
in text documents.  Otherwise, there are better choices.

 > The site states: "this message is encoded to the protocol buffer
 > binary format (the text format above is just a convenient
 > human-readable representation for debugging and editing)".

That merely means that the source format is the one that should
be used for development (i.e., debugging, editing, and sharing).

 > This means that no one other than a programmer will ever be able
 > to see that "more readable" form unless the developer goes to the
 > trouble to expose it in their application in that format.

That is true of any notation that is compiled for further processing.
Developers use the source form.  For any application, no one other
than a programmer ever sees anything except the results that the
program generates.

 > This format does not express the structure with the message content,
 > one of the greatest strengths of XML and the languages based on it
 > such as RDF and OWL.

The data descriptors should be logically accessible when needed,
independent of their physical location.  Following is an excerpt from
a note I sent to a different forum in which similar issues were raised.

John
______________________________________________________________________

 > A general return to non-self-documenting formats popular
 > when processors were much more limited than they are today
 > seems like a step in the wrong direction.

The crucial point is *accessible*.  The supporting tools must make
the descriptors accessible whenever they're needed, but that does
not imply that they must be physically stored and duplicated with
every data instance.

Furthermore, the processors have never become so fast that it's
OK to be wasteful.  If you're processing thousands of data elements,
the time may be negligible compared to human response time.  But if
you have to make a decision to switch data streams in milliseconds
or microseconds, many applications cannot afford to use XML.

If you are storing and processing petabytes or more of data, a factor
of 2 or 10 can make a difference of multimillions of dollars in
hardware cost.  Google and many other companies know that.

But if you really need to keep the descriptors physically close, then
there is an alternative that is more expressive than RDF(S) or OWL
and about a factor of 10 more compact.   It's also an ISO standard:
Common Logic in either the CGIF or CLIF notations.

My colleague Arun Majumdar demonstrated that point on a project as
a consultant for a major telecom.  They had tried to use XML for data
encoding in a task that required switching data streams in millisecond
times.  But they couldn't respond within the time limits.  Arun
replaced the XML encoding with conceptual graphs and met the time
constraints.  The CG version went into production.

John


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx




--
Neil

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>