Wacek: (01)
The concept of "well-formedness" in XML is now well established even if formal
logicians poke holes in the terminological inconsistency. (02)
The problem is a hangover from HTML where many HTML documents, though
syntactically incorrect, were still parsed without browsers falling over. It is
precisely the high fault-tolerance of most browsers that keeps the Web turning
today. If only well-formed documents passed muster, the Web as we know it would
be a very different place. Some would argue that would be a good thing but if
the problem was to have been addressed it should have been done so 15 years
ago, not now. As it is, millions of mere mortal non-programmers grokked the
basics of writing HTML and getting sites up and running, and that was
considered more important than logical purity. That's why repeated attempts to
lock the door after the horse had bolted, failed. (03)
XML (re-)introduced the concept of "well-formed" in order to redress the
balance but alas there are many tools out there that still generate crap code,
whether in HTML, XHTML or another XML application. (04)
"Well-formedness" therefore, although tautological, serves as a useful badge of
conformance for those who do try to make the effort. (05)
Peter (06)
-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Waclaw Kusnierczyk
Sent: 18 April 2007 00:51
To: edbark@xxxxxxxx
Cc: Ontolog Forum
Subject: Re: [ontolog-forum] OWL and lack of identifiers (07)
One more terminological note: (08)
It is confusing to say that an XML document is well-formed in the sense
of its conformance to the XML syntax, since: (09)
- this way of speaking is suggestive of that there can be a
non-well-formed XML document (where 'well-formed' still refers to the
XML syntax), while
- a document that does not conform to the XML syntax simply is not an
XML document. (010)
It makes sense to speak of a valid or invalid XML document, though, with
'valid' referring to another syntax definition (xslt schema/dtd, say).
But, again, to say that an xslt document is valid, with 'valid'
referring to xslt syntax, is equally redundant. (011)
vQ (012)
Ed Barkmeyer wrote:
> Wacek,
>
> you wrote:
>
>> I was actually curious whether you will refer to the XML example.
>> The use of 'well-formed' and 'valid' here are synonymous in that both
>> are used to speak of a document's conformance with a grammar;
>
> With that definition, the terms are equivalent, yes. One can distinguish:
> well-formed/valid with respect the XML grammar
> from
> well-formed/valid with respect to the DTD/schema
> But XML chooses to use 'well-formed' for the first, and 'valid' for the
> second.
>
>> equally well it might be said that a document is valid as a XML
>> document and well-formed as an RDF document. This is just a
>> terminological convention.
>
> Exactly.
>
>> The issue with IANA is a bit different, in that a domain name may be
>> registered and unregistered, i.e., its status as valid or not may
>> change, though the rules of the language do not change, in general.
>
> An interesting point. This is a big difference between a registry of
> values and an enumerated list. The validity of a symbol is
> time-dependent, and related to events out of the context of the usage.
> So my earlier analogy with local symbol definitions, which are solely
> dependent on other elements of the context of use, is inappropriate.
>
>> On the other hand, a document that is XML-well-formed and XXX-valid
>> remains well-formed and valid, unless the grammars are redefined.
>> (You might say that a change in IANA registry is, analogously, a
>> change in the languages rules, if you insist.)
>
> I agree that is stretching the point. The time-dependency is, to me, a
> good reason for discarding the idea that "hostname validity", in the
> sense of being registered, is "syntactic".
>
>> The issue is that in the case of XML the use of 'well-formed' for the
>> one and 'valid' for the other is well-defined and documented. In the
>> case of URLs, I am not sure whether there is an officially established
>> convention of calling a URL 'well-formed' if it conforms to the RFC
>> and calling it 'valid' if it is registered; maybe there is, but maybe
>> this is only wishful thinking.
>
> Upon examination, RFC 1123 refers to "syntactically valid" hostnames as
> those conforming to the production rules. Beyond that, the rest of the
> terminology is all wrapped up with the DNS protocols and failure modes.
> There really isn't a concept of "valid hostname"; the concept is "DNS
> lookup succeeds". This is further complicated by the fact that the DNS
> folk believe their mechanism can work for anything that supports URI
> syntax (as long as the names are short enough), and therefore DNS is not
> limited to hostname lookups.
>
> So I have to admit that there is nothing "syntactic" about the validity
> of a hostname beyond its lexical rules. After that, there is a process
> that succeeds or fails in one of several ways. The function of the
> hostname in a URL is to identify one or more Internet hosts that MAY be
> able to provide the service that maps the URL to a resource. And the
> mapping of hostname to host is properly seen as just part of the complex
> process of performing the URL-to-resource mapping.
>
> I also have to admit that until just now, I hadn't looked at RFC 1123
> for years, and I clearly should have, before creating a concept not
> actually supported by the standards. My apologies to all.
>
> -Ed
>
> P.S. I realize that this is one of those areas in which the "grey hair"
> is not useful. A certain implementation, with which I used to be
> intimately familiar (15 years ago), talks about "valid host names", but
> the standards don't. More importantly, that experience predates HTTP as
> the dominant Internet protocol; the concept URL was just coming into
> existence. And in that time, one's tool did the hostname lookup to find
> the IP address, and then inaugurated the application-specific protocol.
> So "hostname validity" was a concept in its own right. But the URL/URI
> has made the "hostname" issue just a technical element of the "valid
> URL" concept. And in this "identifier" context, separating out the
> "hostname" is inappropriate -- it is now a "lower-level concern". I
> just didn't realize until now that I'm still carrying old baggage in
> this area.
> (013)
--
Wacek Kusnierczyk (014)
------------------------------------------------------
Department of Information and Computer Science (IDI)
Norwegian University of Science and Technology (NTNU)
Sem Saelandsv. 7-9
7027 Trondheim
Norway (015)
tel. 0047 73591875
fax 0047 73594466
------------------------------------------------------ (016)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (017)
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 269.5.2/766 - Release Date: 18/04/2007 07:39 (018)
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 269.5.2/766 - Release Date: 18/04/2007 07:39 (019)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (020)
|