Neil, (01)
I would give the Google developers credit for knowing what they are
doing, especially in version 2.0 of a format that they use for very
large volumes of data. As they say, XML is good for data embedded
in text documents. Otherwise, there are better choices. (02)
> The site states: "this message is encoded to the protocol buffer
> binary format (the text format above is just a convenient
> human-readable representation for debugging and editing)". (03)
That merely means that the source format is the one that should
be used for development (i.e., debugging, editing, and sharing). (04)
> This means that no one other than a programmer will ever be able
> to see that "more readable" form unless the developer goes to the
> trouble to expose it in their application in that format. (05)
That is true of any notation that is compiled for further processing.
Developers use the source form. For any application, no one other
than a programmer ever sees anything except the results that the
program generates. (06)
> This format does not express the structure with the message content,
> one of the greatest strengths of XML and the languages based on it
> such as RDF and OWL. (07)
The data descriptors should be logically accessible when needed,
independent of their physical location. Following is an excerpt from
a note I sent to a different forum in which similar issues were raised. (08)
John
______________________________________________________________________ (09)
> A general return to non-self-documenting formats popular
> when processors were much more limited than they are today
> seems like a step in the wrong direction. (010)
The crucial point is *accessible*. The supporting tools must make
the descriptors accessible whenever they're needed, but that does
not imply that they must be physically stored and duplicated with
every data instance. (011)
Furthermore, the processors have never become so fast that it's
OK to be wasteful. If you're processing thousands of data elements,
the time may be negligible compared to human response time. But if
you have to make a decision to switch data streams in milliseconds
or microseconds, many applications cannot afford to use XML. (012)
If you are storing and processing petabytes or more of data, a factor
of 2 or 10 can make a difference of multimillions of dollars in
hardware cost. Google and many other companies know that. (013)
But if you really need to keep the descriptors physically close, then
there is an alternative that is more expressive than RDF(S) or OWL
and about a factor of 10 more compact. It's also an ISO standard:
Common Logic in either the CGIF or CLIF notations. (014)
My colleague Arun Majumdar demonstrated that point on a project as
a consultant for a major telecom. They had tried to use XML for data
encoding in a task that required switching data streams in millisecond
times. But they couldn't respond within the time limits. Arun
replaced the XML encoding with conceptual graphs and met the time
constraints. The CG version went into production. (015)
John (016)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (017)
|