[Top] [All Lists]

Re: [ontolog-forum] fitness of XML for ontology

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Paul Tyson <phtyson@xxxxxxxxxxxxx>
Date: Fri, 07 Feb 2014 22:20:27 -0600
Message-id: <1391833227.5887.67.camel@tristan>
Duane, I appreciate your contributions to this discussion. More below,
little of it directed at you, just using your comments as a springboard.    (01)

On Wed, 2014-02-05 at 21:59 -0800, Duane Nickull wrote: 
> Paul:
> On 2014-02-05 8:57 PM, "Paul Tyson" <phtyson@xxxxxxxxxxxxx> wrote:
> >I agree. But: 1) I don't think any of the specs in the W3C semantic
> >technology stack are aimed directly at supporting academic researches in
> >logic programming and theorem proving; 2) it is trivial to
> >down-translate XML to any leaner notation for specialized processing, or
> >to a display format as an aid to understanding; and 3) the overall
> >benefits of representing enterprise knowledge in XML far outweigh the
> >cost of the extra markup.
> #2 is not necessarily true.  It is not "trivial".  There are actually data
> fragments or structures that are not supported directly by Xml without
> some creative hacking.  Even UML CVD's are not always directly convertible
> to XML.  It also depends on the structures supported by the leaner
> notation.    (02)

I did say *from* XML, but yes, that was perhaps too broad a statement;
and "trivial" of course depends on one's experience with the idiom and
techniques. In my work I have not encountered any transformation
requirement (from XML) that outstrips the capabilities of XSLT2. That
experience includes many XML-to-XML and XML-to-HTML conversions, as well
as to a variety of text formats including SQL, SPARQL, Java, EXPRESS,
csv, CLIF, JSON, and some proprietary scripting languages.    (03)

Since I've never run up against this roadblock, I've never researched
it, but I suppose there could be a pair of grammars with some opposing
features that make it impossible to convert an arbitrary instance of one
to the other by conventional means.    (04)

> XML is derived from XML Infoset. There are 11 basic abstract model
> components.  If these are not all supported by the leaner notation,
> problems will arise.    (05)

Not to quibble about chicken and egg, but I wouldn't have put XML and
Infoset in that relation. As you know, but innocent bystanders mightn't,
XML Infoset is the Talmudic commentary on XML that specifies what the
result of parsing an XML instance must be, in precise terms, as the
consuming application should see it. 
> #3 is also not necessarily true in all contexts.  Without some form of
> requirements or details of what one is trying to accomplish, such
> statements are misleading.  When you say 'representing', do you mean for
> persisting or merely for marshalling data into XML for app to app
> conveyance?     (06)

Any or all of those. No one sees data on a disk, in a file or database,
or in the wires or radio waves comprising the network. They "see data"
as some arrangement of glyphs and graphic elements on screen or paper,
and that arrangement is not random, but structured so as to "convey
meaning". What is structure? Leaving aside graphic features for the
moment, "structure" in a stream of characters is fundamentally about
containment, sequence, and semiotic identity (more on these below). Many
hands and minds contribute to the final display, but none so strongly
determine the effective "meaning" as the ones that arrange the
characters prior to display: what you call 'merely marshalling data'.    (07)

If we ask of an enterprise where lies the value of its intellectual
assets, the only answer can be: in the selection and arrangement of
words (and other significant tokens) that describe its products,
services, and processes. Unfortunately the "many hands" approach to
enterprise information management gives us a very muddy stew. XML points
a way out of the mess.    (08)

Let me explain why XML is human-computer literacy made manifest.    (09)

When people communicate with one another using words and symbols they
arrange those tokens using the fundamental relationships of containment,
sequence, and--for lack of a better term--"semiotic identity" (that is,
placing the same token at different locations to indicate a relationship
between the contexts in which the identical tokens occur).    (010)

Informal natural language discourse typically occurs in a rich context
where all participants are well-prepared (through shared experience and
learning) to immediately recognize--in fact, to overlook--these
fundamental relationships and proceed immediately to reconstructing the
intended meaning. In this process a great deal of out-of-band
information--that is, information not included in the message at
hand--is used.    (011)

Formal speaking and writing requires the communicator to give more
thought to these fundamental relationships: to organize topics by some
principal or other, put them in sensible order, and to use terms
consistently. Typically, the more formal modes of communication rely
less on out-of-band information for success. In the old days when
enterprises ran on paper communication, documents were highly formalized
information structures to preserve and transmit the intellectual assets
of the company.    (012)

It was the genius of [S]GML to extremize these tendencies toward
formalization in order to make texts "understandable" to the most
demanding--and dumbest--listener, the computer. SGML provided a simple
unambiguous syntax for indicating containment, sequence, and semiotic
identity around and within a sequence of characters. Element start and
end tags signal containment; sequence is (naturally) one element after
another; and the semiotic identity of element and attribute names
indicates some sense of "sameness" among those components. Semiotic
identity also gives rise to the built-in ID/IDREF mechanism (for
schema-driven processing), or any other application-determined notion of
referencing based on string identity. As an added bonus, SGML provided a
simple typing system that fell out from the use of distinguished strings
as element and attribute names, and the ability to specify simple
grammars constraining the containment and sequence relationships that
could be instantiated.    (013)

It is, I have learned, pointless to review the virtues of SG/XML for
people who are busy inventing new ways of doing the same thing. I have
come to view such suboptimal pursuits as IT's contribution to full
employment.    (014)

--Paul    (015)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (016)

<Prev in Thread] Current Thread [Next in Thread>