ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Foundation Ontology

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Neil Custer" <neil.custer@xxxxxxxxx>
Date: Wed, 27 Aug 2008 14:35:05 -0500
Message-id: <7a5bda850808271235haba7ebbv9492ed8150416e33@xxxxxxxxxxxxxx>
John and forum-mates,

Unfortunately, this is a very, very simple example that the Google team used.  Two negatives I can see to using such technology to encode the Foundation Ontology worth considering:

1.  You stated "the Google form is somewhat more readable than the XML form, but the difference in readability escalates rapidly for large examples". 

Problem:  In the text you quoted from the site, you glossed right past one 'show-stopper' in my opinion.  The site states: "this message is encoded to the protocol buffer binary format (the text format above is just a convenient human-readable representation for debugging and editing)". 

This means that no one other than a programmer will ever be able to see that "more readable" form unless the developer goes to the trouble to expose it in their application in that format -- Not flexible, subject to interpretation by the developer, and the actual format is hidden from the end user.

2.  This format does not express the structure with the message content, one of the greatest strengths of XML and the languages based on it such as RDF and OWL.  The Google team concedes this is a point of consideration when looking at the use of protocol buffers as a solution:  "protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file). "

My two cents I hope are worth consideration.

-- Neil

On Wed, Aug 27, 2008 at 9:57 AM, John F. Sowa <sowa@xxxxxxxxxxx> wrote:
As I said in previous notes, the Foundation Ontology should be designed
to use any or every logic-based notation as input, and the internals
should be stored in some suitable version of logic.

For complex logical expressions, a full version of logic, such as the
Common Logic standard would be necessary.  But a very large amount of
specification can be done in much simpler notations.  RDF(S) and OWL
are widely used, but their human factors leave much to be desired.

Recently, Google has released specifications and software for their
_Protocol Buffers_, which use a compact, elegant, humanly readable
notation that is also very efficient for computer processing.
Following is an example from their documentation:

   person {
     name: "John Doe"
     email: "jdoe@xxxxxxxxxxx"
   }

Following is an equivalent in XML (an RDF version would look worse):

   <person>
      <name>John Doe</name>
      <email>jdoe@xxxxxxxxxxx</email>
   </person>

For such a short example, the Google form is somewhat more readable
than the XML form, but the difference in readability escalates rapidly
for large examples.  Just as important is the computer efficiency.
Compressing XML notations by ZIP or other algorithms takes an enormous
amount of time, but the Google software is extremely fast.  Following
is their comment:

 > When this message is encoded to the protocol buffer binary format
 > (the text format above is just a convenient human-readable
 > representation for debugging and editing), it would probably be
 > 28 bytes long and take around 100-200 nanoseconds to parse. The
 > XML version is at least 69 bytes if you remove whitespace, and
 > would take around 5,000-10,000 nanoseconds to parse.

Following is the Google summary, which I find convincing:

 > Protocol buffers have many advantages over XML for serializing
 > structured data. Protocol buffers:
 >
 >   * are simpler
 >   * are 3 to 10 times smaller
 >   * are 20 to 100 times faster
 >   * are less ambiguous
 >   * generate data access classes that are easier to use
 >     programmatically

The three primary languages they support are Java, C++, and Python.
But other groups have implemented versions for C, C#, Perl, PHP,
Ruby, LISP, Erlang, Haskell, and ActionScript.

The above example came from the Google Developer's Guide:

   http://code.google.com/apis/protocolbuffers/docs/overview.html

At the end of this note are some quotations from Google developers.
A very important point is that the Google notation with its supporting
software has been implemented and tested on very large applications --
probably some of the largest applications in the world.  And the
software they are now making available (under the Apache license)
is already version 2.0, so the speed bumps have been smoothed out.

Recommendation: I suggest that we use the Google notation to represent
simple type hierarchies at the level of Aristotle's syllogisms.  That
is the most commonly used subset of OWL, and it can be automatically
translated to Common Logic, and every other notation that anyone has
been using for ontologies.  For more complex expressions, a richer
version of logic could be used.  But I would recommend a version of
controlled English as the *primary* notation for complex logic.  Other
notations, including OWL or Common Logic, would be compiled *from*
controlled English.

John Sowa
_______________________________________________________________________

http://www.informationweek.com/news/internet/google/showArticle.jhtml?articleID=208803049

"It's the way we encode almost any sort of structured information which
needs to be passed across the network or stored on disk," said Chris
DiBona, Google's open source programs manager, in a blog post. "We
thought Protocol Buffers might be useful to other people, too, so we've
decided to release it as open source software."

Google software engineer Kenton Varda, in a post on the Google open
source blog, said that Google uses literally thousands of different data
formats, most of which are structured. Encoding these data formats on a
massive scale is too much for XML, so Google developed Protocol Buffers.

Varda compares Protocol Buffers to an Interface Description Language
(IDL), without the complexity. "[O]ne of Protocol Buffers' major design
goals is simplicity," said Varda. "By sticking to a simple
lists-and-records model that solves the majority of problems and
resisting the desire to chase diminishing returns, we believe we have
created something that is powerful without being bloated. And, yes, it
is very fast -- at least an order of magnitude faster than XML."

For Google's FAQ, see

http://code.google.com/apis/protocolbuffers/docs/faq.html



_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx




--
Neil

Life is Great!-- It sure beats the alternative!!
So live better - live longer! See our exclusive health and nutrition products at:
http://ncuster.qhealthzone.com


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>