John and forum-mates,
Unfortunately, this is a very, very simple example that the Google team used. Two negatives I can see to using such technology to encode the Foundation Ontology worth considering:
1. You stated "the Google form is somewhat more readable than the XML form, but the difference in readability escalates rapidly for large examples".
Problem: In the text you quoted from the site, you glossed right past one 'show-stopper' in my opinion. The site states: "this message is encoded to the protocol buffer binary format
(the text format above is just a convenient human-readable representation for debugging and editing
This means that no one other than a programmer will ever be able to see that "more readable" form unless the developer goes to the trouble to expose it in their application in that format -- Not flexible, subject to interpretation by the developer, and the actual format is hidden from the end user.
2. This format does not express the structure with the message content, one of the greatest strengths of XML and the languages based on it such as RDF and OWL. The Google team concedes this is a point of consideration when looking at the use of protocol buffers as a solution: "protocol buffers are not always a better solution than XML – for
instance, protocol buffers would not be a good way to model a
text-based document with markup (e.g. HTML), since you cannot easily
interleave structure with text. In addition, XML is human-readable and human-editable; protocol
buffers, at least in their native format, are not
. XML is also – to
some extent – self-describing. A protocol buffer is only meaningful if
you have the message definition
My two cents I hope are worth consideration.
On Wed, Aug 27, 2008 at 9:57 AM, John F. Sowa <sowa@xxxxxxxxxxx>
As I said in previous notes, the Foundation Ontology should be designed
to use any or every logic-based notation as input, and the internals
should be stored in some suitable version of logic.
For complex logical expressions, a full version of logic, such as the
Common Logic standard would be necessary. But a very large amount of
specification can be done in much simpler notations. RDF(S) and OWL
are widely used, but their human factors leave much to be desired.
Recently, Google has released specifications and software for their
_Protocol Buffers_, which use a compact, elegant, humanly readable
notation that is also very efficient for computer processing.
Following is an example from their documentation:
name: "John Doe"
Following is an equivalent in XML (an RDF version would look worse):
For such a short example, the Google form is somewhat more readable
than the XML form, but the difference in readability escalates rapidly
for large examples. Just as important is the computer efficiency.
Compressing XML notations by ZIP or other algorithms takes an enormous
amount of time, but the Google software is extremely fast. Following
is their comment:
> When this message is encoded to the protocol buffer binary format
> (the text format above is just a convenient human-readable
> representation for debugging and editing), it would probably be
> 28 bytes long and take around 100-200 nanoseconds to parse. The
> XML version is at least 69 bytes if you remove whitespace, and
> would take around 5,000-10,000 nanoseconds to parse.
Following is the Google summary, which I find convincing:
> Protocol buffers have many advantages over XML for serializing
> structured data. Protocol buffers:
> * are simpler
> * are 3 to 10 times smaller
> * are 20 to 100 times faster
> * are less ambiguous
> * generate data access classes that are easier to use
The three primary languages they support are Java, C++, and Python.
But other groups have implemented versions for C, C#, Perl, PHP,
Ruby, LISP, Erlang, Haskell, and ActionScript.
The above example came from the Google Developer's Guide:
At the end of this note are some quotations from Google developers.
A very important point is that the Google notation with its supporting
software has been implemented and tested on very large applications --
probably some of the largest applications in the world. And the
software they are now making available (under the Apache license)
is already version 2.0, so the speed bumps have been smoothed out.
Recommendation: I suggest that we use the Google notation to represent
simple type hierarchies at the level of Aristotle's syllogisms. That
is the most commonly used subset of OWL, and it can be automatically
translated to Common Logic, and every other notation that anyone has
been using for ontologies. For more complex expressions, a richer
version of logic could be used. But I would recommend a version of
controlled English as the *primary* notation for complex logic. Other
notations, including OWL or Common Logic, would be compiled *from*
"It's the way we encode almost any sort of structured information which
needs to be passed across the network or stored on disk," said Chris
DiBona, Google's open source programs manager, in a blog post. "We
thought Protocol Buffers might be useful to other people, too, so we've
decided to release it as open source software."
Google software engineer Kenton Varda, in a post on the Google open
source blog, said that Google uses literally thousands of different data
formats, most of which are structured. Encoding these data formats on a
massive scale is too much for XML, so Google developed Protocol Buffers.
Varda compares Protocol Buffers to an Interface Description Language
(IDL), without the complexity. "[O]ne of Protocol Buffers' major design
goals is simplicity," said Varda. "By sticking to a simple
lists-and-records model that solves the majority of problems and
resisting the desire to chase diminishing returns, we believe we have
created something that is powerful without being bloated. And, yes, it
is very fast -- at least an order of magnitude faster than XML."
For Google's FAQ, see
Life is Great!-- It sure beats the alternative!!
So live better - live longer! See our exclusive health and nutrition products at: http://ncuster.qhealthzone.com
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (01)