On Apr 13, 2007, at 3:10 PM, Ed Barkmeyer wrote:
When the URI is a reference to a Web page (full stop), the
resource is the web page, and by extension, the information content of the web
page.
I think of the page and its information content as being
separate.
From an ontological point of view, I may also want to
distinguish the content from its external representation, if that was your
point. But the Web does not make
that distinction. Put another
way, the Web consciously manages external representations of information, and
leaves the abstraction of content to the reader. The whole idea of the Semantic Web is
to provide standard external representations for some orderly abstraction of
content, in order to facilitate search.
I find it important to distinguish the location of the
information from its content, which was my point. So perhaps we are talking past each
other.
But the definition of URI (IETF RFC 2396) says it identifies
a "resource".
For example, I can make statements about the style of the
page display, the server where the <html> tags reside, the provenance
information for the page. These
are all separate from the information content of the page.
We have now identified several distinguishable concepts:
2) the
presentation structure (web page)
3) the
information content
4) a formal
description of the content
5) the
"provenance metadata" for the content
6) the
provenance metadata for the presentation
7) the
provenance metadata for the presentation in that place
And we could easily make a model (ontology) for these things
and their relationships:
place(1)
conveys presentation(2)
presentation(2)
conveys content(3)
content(3)
has formal description(4)
content(3)
has provenance of content(5)
presentation(2)
has provenance of presentation(6)
place(1) has
provenance of site content(7)
Further we note that there are other possibilities. In particular,
place(1)
provides service(8)
service(8)
permits access to presentation(2)
RFC 2396 is pretty clear that a URL identifies a place(1)
full stop, and indicates a means of access to whatever is at that place. From our would-be ontology above,
what is thus addressed is either a presentation/document or a service.
By comparison, RFC 2396 says that a URI identifies a
"resource". And all of
(2),(4),(5),(6),(7) and the service (8) are distinct resources that may be
found at the *same site*. (I
think the Web view is that content(3) is only accessible through its
presentation(2).) It follows
that each of them should have a distinct URI.
Those URIs may be distinct URLs in their own right, or they may all
incorporate a common URL and each have a distinct fragment identifier.
Since a URL always identifies a place, if the distinct
resources have distinct URLs, our model above needs some additions:
place(1)
conveys formal description(4)
place(1)
conveys provenance of content(5)
place(1)
conveys provenance of presentation(6)
place(1)
conveys provenance of site content(7)
One place can convey some or all of (2),(4),(5),(6),(7),(8),
but when one place conveys more than one of them, each has a distinct URI whose
"fragment identifier" distinguishes the "component". And by convention, in those cases,
the URI with no fragment identifier (the simple URL) conveys either (2) or (8). It is also possible that we have a
(9), which is a web page that is a container for (2),(4),(5),(6),(7), delivered
as a single resource.
Note that our model is starting to get rather messy.
This is why Tim Burners-Lee says you need to impose some
discipline on your site. The
problem is that several different conventions have emerged (including not
imposing any discipline), and there are no reference standards.
In a somewhat different vein, I wrote:
I have argued with TBL before that URIs that are URLs
confuse WHAT something is with WHERE it is.
And it is only an acceptable idea when that relationship is required to
be 1-to-1. The idea of
identifiers is that you can test for equal.
When the same thing can be in multiple places, unequal doesn't tell me
anything, which is annoying, especially when tools think unequal to the
expected value means unusable. And
when the same place can hold different things, equal doesn't tell me anything,
which defeats the purpose.
What you are saying is it doesn't serve the purpose you have
in mind, not that it doesn't serve other purposes quite well. One could say the success of the Web
shows a real value.
Whoa! I fully
agree that URLs locate lots of useful and functionally different things, just
as postal addresses do. But if
today it's a bank and tomorrow it's a laundry or a residence or a casino, what
"resource" is being "identified"?
What I said was that if the content to which a URI refers
changes radically from day to day, the URI doesn't identify "an
information resource" in any useful sense. And thus the idea that the URI
identifies something different from a location is false. If the purpose of a URI is to denote
content, function, behavior, as distinct from location, some one of those has
to be consistent over time. A
bulletin board and a pulpit are just locations.
(I wonder how many XML tools would break if the namespace
URL for XML Schema pointed to a local copy of the specification... Is the W3C URI THE name or A name for
the XML Schema specification?)
This is where provenance comes in. It is THE URI if you believe W3C to
be the authoritative source.
1. The location of the document
2. The identity of the document as the
one issued by the authoritative source.
Example: The
authoritative source for the Oxford Dictionary of English is presumably in
Oxford, England, but I can find the document at my public library.
All of the copies of the ODE have the same designation, but
you can find copies in lots of places. So
if I point you to a place where you can find it, that has nothing to do with
the authoritative source.
But my example was wrong.
The xmlns reference is to the "namespace URI", which is the
required *identifier* for the specification.
The tool is free to get a copy from anywhere it likes. So if I put another URL there, it may
be a location of a copy of the specification, but it is NOT the *identifier*,
and the tool should fail. It is
exactly as if I referred to the "Peoria Public Library's dictionary"
instead of the ODE.
The webhead idea is that you will always go to the URL,
fetch the resource, and use it. The
idea that a tool has been pre-programmed to support that *content*, and, in
conducting a web-based transaction, this might require the tool to fetch and
compare two 10MB files to determine whether they are *versions of* the same
specification, is beyond their hobbyist view of the Internet.
So what metadata do you need in place to support your use? How do you want to create and
maintain that metadata? Will you
make it available for others to use?
Ah, now we are talking about what "responsible
management" of referenceable resources might be. This is the kind of discipline that
the WebDAV folks have worked on, and there is a "widely accepted"
scheme for life cycle management of documents. The trouble is that it is widely
accepted among the various organizations involved in making document and
metadata standards, but those folks operate and influence less than 1% of
websites. It does mean that publishers,
and standards organizations, and library websites will probably use it.
Everything is a resource to someone, as it should be. What we want to be able to do is
differentiate resources so we use the one(s) most suitable for our needs.
Exactly. But
unless there are common conventions for that differentiation, all we have is a
bunch of disorganized resources labeled according to hundreds or thousands of
incompatible schemes, most of which are not very good or very useful. Google has built a successful
enterprise on the failure of the Web, and its principal resources, to address
that problem. And there are many
who believe that that also is as it should be.
IMO, the problem is that Internet is still the big city of
the Middle Ages. We know how to build all kinds of buildings and we have a lot
of demand for them and a lot of construction of various kinds and qualities
going on. But no one is
responsible for much of it, we have no civil engineering discipline, we have no
land use planning, we have random patchworks of streets, we are carrying the
water on foot in buckets from the most convenient well, we have no police force
and no fire brigade, we have sewage problems, crime problems and frequent
plagues. Some communities thrive
and some die out, and we don't really understand why. And yet people keep coming here,
because there is education, and jobs, and entertainment, and money to be made. Ultimately, technology enabled us to
get control of it, and fires and plagues forced us to. But it took 7 centuries. I hope the Internet experience is
shorter.
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263
Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263
FAX: +1 301-975-4694
"The opinions expressed above do not reflect consensus
of NIST,
and have not
been reviewed by any Government authority."