Wacek (01)
Now that we agree we are splitting hairs, ... (02)
you wrote: (03)
> A formal language is a set of strings over an alphabet -- a set of
> symbols. You may define the language ostensibly, by listing all the
> strings, which would imply which symbols are in the alphabet. You may
> define the language by means of a grammar, that is, specify the alphabet
> and rules for forming strings in the language. You may do both, but the
> definitions must agree.
>
> When you say that a URL may be a well-formed string and yet not in the
> language (i.e., not registered by IANA as an element of the language),
> then you state a contradiction. Either the string is not well-formed,
> or it is in the language. (04)
Note that the XML Recommendation also makes a distinction between
"well-formed" and "valid". "Well-formed" XML fits the grammar given in the
XML Recommendation; "validity" is defined with respect to some "document
definition", which is, in effect, a further grammar imposed on the XML lexical
tokens. The original DTD document definitions were clearly only syntactic
specifications, whereas XML Schema document definitions have greater
pretensions (they are more than syntax and less than content models). But if
we ignore the bastard aspects of XML Schema, what we have is two levels of
syntactic requirements. A string can therefore be well-formed, and an
instance of the "XML language" as defined in the Recommendation, and still
fail to be a valid "document", i.e. fail to be an instance of the "document
language" defined by the document definition. But they are both "syntactic"
grammars, and a failure to meet the specifications of either may therefore be
considered a "syntax error". XML prudently avoids use of the term. (05)
My position was only that there are also two levels at work in the "syntactic"
validity of URLs. One is being "well-formed", conforming to the lexical
requirements for a URL per RFC 2396. The other is that the 'host' token is
"valid", as defined by the IANA registry. One could equally well argue that
the latter is a "semantic" concern. But there are clearly separable semantic
concerns -- whether the URL refers to something and whether that something is
accessible. (06)
The problem seems to be that in these ad hoc languages with multiple levels of
parsing grammars, the traditional terminology of formal languages -- lexical,
syntactic, semantic -- is inaccurate and insufficient. By the same token (so
to speak), the OMG Model-Driven Architecture has run aground on the inability
to define how many levels of model there are and how to associate content with
levels. (07)
What is at work here is that computer science has made a major advance in the
last 10+ years: Where we used to have tools that defined the capabilities,
and languages that matched the tool capabilities, we now have dynamically
configurable tools that support the capabilities, and languages that let the
user define the capabilities. (08)
-Ed (09)
P.S. With respect to the last, I'm sure people will tell us that tool suite
XYZ had this capability 30 years ago. The point is that this is now
mainstream behavior, not the uncharted island of technical excellence in a sea
of average competence. (010)
--
Edward J. Barkmeyer Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263 Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263 FAX: +1 301-975-4694 (011)
"The opinions expressed above do not reflect consensus of NIST,
and have not been reviewed by any Government authority." (012)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (013)
|