On Apr 12, 2007, at 6:00 PM, Ed Barkmeyer wrote:
Ken Laskey wrote:
I often questioned when talking about a URI that dereferenced to a Web page whether I was talking about
- the Web page (say, a collection of information on King Arthur),
- the subject of the Web page (i.e. King Arthur), or
- some particular piece of information (say, King Arthur had a sword named Excalibur)
The URI is by definition a designator for the "resource". So perhaps the question is: "What is the resource?"
When the URI is a reference to a Web page (full stop), the resource is the web page, and by extension, the information content of the web page.
I think of the page and its information content as being separate. For example, I can make statements about the style of the page display, the server where the <html> tags reside, the provenance information for the page. These are all separate from the information content of the page.
This latter may be what Ken means by "the subject of the Web page". The problem is that the information content of the same Web page can change, and that is not true of any other resource. On the Web, the content doesn't really have identification when the page is volatile.
I could, however, make a comment about the content of the page on a given date/time. That would certainly be true if I pointed out an error in the content and the error was corrected.
I would go so far as to say that stable web pages correspond to the concept "resource" and volatile web pages don't. There is probably a useful distinction between a web page that changes to contain the "current" information on a fixed topic, like a weather site, or a wiki, and a web page that is the blog of the day. The former can be viewed as "nearly stable" from the point of view of information content, while the latter is just a location at which somewhat random information is posted. And the latter is a place, not a resource.
I can certainly see some fine RDF statements based on your definition of stable, where the object is the URI of the Web page but the definition of stable is also a resource identified by a URI.
I have argued with TBL before that URIs that are URLs confuse WHAT something is with WHERE it is. And it is only an acceptable idea when that relationship is required to be 1-to-1. The idea of identifiers is that you can test for equal. When the same thing can be in multiple places, unequal doesn't tell me anything, which is annoying, especially when tools think unequal to the expected value means unusable. And when the same place can hold different things, equal doesn't tell me anything, which defeats the purpose.
What you are saying is it doesn't serve the purpose you have in mind, not that it doesn't serve other purposes quite well. One could say the success of the Web shows a real value.
Sir Tim's view is that URLs can be useful URIs and it is up to the provider to manage it properly, as W3C does. Unfortunately, much of the Internet and much of the tooling is built by people who don't know about, don't care about, or don't understand that they should care about, responsible management.
(I wonder how many XML tools would break if the namespace URL for XML Schema pointed to a local copy of the specification... Is the W3C URI THE name or A name for the XML Schema specification?)
This is where provenance comes in. It is THE URI if you believe W3C to be the authoritative source. If you believe something else is more authoritative, for example if someone else was doing better incorporation of errata, then your idea of authoritative (expressed in a resource identified with a URI) might be more appropriate.
The webhead idea is that you will always go to the URL, fetch the resource, and use it. The idea that a tool has been pre-programmed to support that *content*, and, in conducting a web-based transaction, this might require the tool to fetch and compare two 10MB files to determine whether they are *versions of* the same specification, is beyond their hobbyist view of the Internet.
So what metadata do you need in place to support your use? How do you want to create and maintain that metadata? Will you make it available for others to use?
Now, if the resource is a "particular piece of information", that should be different. The URI for a particular piece of information should extend the URL by a "fragment identifier". HTML fragment identifiers are bookmarks that tell you where the piece of information starts but unfortunately don't bound the other end. XPath fragment identifiers refer to bounded XML elements that should represent the "particular piece of information" exactly. RDF fragment identifiers are typically XML ids, which refer to a particular XML element that should include the definition of the concept. The XML technology doesn't extend to finding all the references to that id (rdf:about) that extend the concept by adding properties. RDF technology does, but to be useful, that technology has to be what is addressed by the URL -- the RDF query service, which is yet another kind of "resource".
Everything is a resource to someone, as it should be. What we want to be able to do is differentiate resources so we use the one(s) most suitable for our needs.
-Ed
--
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263 Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263 FAX: +1 301-975-4694
"The opinions expressed above do not reflect consensus of NIST,
and have not been reviewed by any Government authority."