ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] OWL and lack of identifiers

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Ed Barkmeyer <edbark@xxxxxxxx>
Date: Mon, 16 Apr 2007 12:50:00 -0400
Message-id: <4623A938.80103@xxxxxxxx>
With respect to what is actually supported by W3C and IETF standards, there 
have been a few bits of misinformation bandied about in this discussion.    (01)

In particular,    (02)

Cory Casaneve wrote:
>  Also the URI mechanism has all that is needed to distinguish
> resources from identities.  For some reason we tend to use the web
> protocol "HTTP" where as this makes no sense for a pure identity.    (03)

More on this below.    (04)

> We could substitute any protocol name in a URI to distinguish
> logical resources, such as:
>  
> "identity://cim3.net/MyCat" (A pure identity)
> Vs.
> "http://www.cim3.net/CIM3_Executive_Brief_files/frame.htm"; (A real resource)
>  
> While there is no standard for "identity" it can be used without
> a problem since we are not expecting to utilize it as an internet
> protocol.     (05)

This part is false.  The problem is that, since there is no standard for 
things beginning with the prefix "identity:", there is no agreed upon syntax 
for what follows it, and there is no registration authority that guarantees 
uniqueness of anything that follows it.  Several different resources could be 
named:
   identity:John.Q.Smith
and the same "resource" (whatever we agree is the resource itself) could have 
several different names beginning identity:.  Such "identifiers", therefore, 
do not meet either test for an identifier to be useful.    (06)

The presumption Cory seems to make is that identity: is followed by a "unique 
domain name" that begins with an IP identifier registered by IANA.  But for 
identity: to be useful to anyone but Cory and a few of his best friends, a 
large body of Internet users would have to agree to use the prefix with that 
syntax.  And that, friends, would be a standard, and to make it maximally 
acceptable, you would register that standard through the IETF (which is where 
http:, and urn:, and ftp:, and so forth are registered).    (07)

In response to Cory, Peter Brown wrote:
> That is what urn: is supposed to do, no?    (08)

Right.  urn: was *supposed to* function as Cory's "identity:", BUT
it does NOT have Cory's presumed syntax.  In fact, the characters period and 
solidus (/) are not even permitted in URNs!  There was a clear, if misguided, 
effort on the part of the designers of URNs to make it impossible for URNs to 
look like URLs, or to contain domain identifiers assigned by IANA.    (09)

The intent of the URN standard is to force a completely separate IANA 
registration authority for "identifiers" that has nothing to do with web 
addresses.  Further, the intent was to ensure "persistent uniqueness" and 
"persistent association".  You can apply to IANA for a urn prefix registered 
to you or your business and you will get back your very own prefix, of the form:
   urn-<digit string>:
and you are free to choose the syntax and association rules for anything 
following that prefix.  And IANA promises that it will never change the 
registration for that urn digit string to any other person or organization, 
ever.  So the identifiers you assign cannot later to refer to some other 
resource, because someone else bought your domain name when you went out of 
business or died or defaulted.  URN was to fix all the URL problems with 
uniqueness and persistence.    (010)

The problem with this approach is that you don't get a mnemonic URN 
identifier; you get a digit string.  People won't be able to remember or guess 
these things, and IANA hasn't yet thought about check digits for the 9-digit 
numbers that they will need if every business in the world registered one.  So 
we have yet another number that functions like a telephone number, without the 
very valuable advantage of cascading directories.  As designed, URNs are one 
giant telephone book with an unbounded list of numbers that might well hit 
10**10.    (011)

IETF did create one loophole.  If your activity provides "general value to the 
Internet community", you can register your favorite set of letters as a URN 
prefix, but you have to produce a standard that defines your URN syntax, and 
register it as an IETF standard.  So, of course, there are registered URN 
prefixes for IETF, W3C, ISO, ITU-T "OID", FIPA, ISBN, OASIS, and a few others.    (012)

The registry is at:
   http://www.iana.org/assignments/urn-namespaces    (013)

If Cory wants his "identity:", he can register his own URN prefix, and spell 
it "urn-316:" or whatever magic number he gets.  (I think the next number 
currently available is urn-7!)  But it can't be followed by "//cim3.net", 
because 3 of those characters are not permitted in URNs.  He could use:
   urn:urn-316:cim3:net:MyCat    (014)


Waclaw Kusnierczyk wrote:
> Re: John Sowa's post on URIs etc.
> 
> "This discussion raises some serious issues:    (01)
> 
>    4. The URLs and URIs of the WWW are based on a naming
>       scheme that ultimately resolves to physical devices.
>       It guarantees that an identifier will determine a
>       unique storage location at a given point in time.    (05)"
> 
> This is not exactly true.  In w3 specs [1] we read:
> 
> "A Uniform Resource Identifier (URI) is a compact sequence of characters 
> that identifies an abstract or physical resource", and further that "in 
> many cases, URIs are used to denote resources without any intention that 
> they be accessed" and "a common misunderstanding of URIs is that they 
> are only used to refer to accessible resources. The URI itself only 
> provides identification; access to the resource is neither guaranteed 
> nor implied by the presence of a URI."
> 
> A URL is a URI with a specialized syntax: "A URI can be further 
> classified as a locator, a name, or both. The term "Uniform Resource 
> Locator" (URL) refers to the subset of URIs that, in addition to 
> identifying a resource, provide a means of locating the resource by 
> describing its primary access mechanism".  Ultimate resolution to 
> physical devices is *not* a part of the scheme.
> 
> "
>    5. However, the policies of the WWW and of each domain
>       on the WWW permit the same identifiers to be resolved
>       to different physical locations at different times.    (06)
> "
> 
> Precisely.  URLs, those that do identify locations, identify them in 
> virtue of there being a mapping from a particular URI to a physical 
> address, by means of dns services.  And if there is no such mapping, 
> there is no translation, and in effect the URL does not identify a 
> location -- try, e.g., 'http://www.nonsense.no', a (syntactically) 
> *valid* URL.  Some (most?) valid URLs do *not* ultimately resolve to a 
> physical device.    (015)

This is technically partly correct, but the paragraph is badly misleading.    (016)

What RFC 2396 says is that a URL begins with a prefix that identifies an 
Internet access protocol, and it is required to have the syntax required by 
the specification for that protocol and the interpretation associated with 
that syntax.    (017)

The HTTP standard (RFC 2616) defines the rest of the syntax of URIs beginning 
http:, and says:    (018)

    3.2.2 HTTP URL    (019)

    The "http" scheme is used to locate network resources via the HTTP
    protocol.  This section defines the scheme-specific syntax and
    semantics for http URLs.    (020)

    http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]    (021)

    If the port is empty or not given, port 80 is assumed. The semantics
    are that the identified resource is located at the server listening
    for TCP connections on that port of that host, ...    (022)

And, by adoption of text from RFC 2396:
    The host is a domain name of a network host, or its IPv4 address ...    (023)

That is, the URL *means* that the resource is accessible at the designated 
host via the HTTP TCP port.  It doesn't refer to a *physical device*, but it 
does refer to a specific *host*, and indirectly to the physical device that 
currently responds to that TCP/IP address and provides network services.    (024)

If the URL doesn't mean that, it is not a "valid" HTTP URL, because it does 
not satisfy the requirements of RFC 2616.  So the example URI that Waclaw 
offers:
   'http://www.nonsense.no'
is NOT a "valid" URL.  It may satisfy the grammar requirements, but it 
probably does not satisfy the syntactic requirement for www.nonsense.no to be 
a valid domain name (i.e., a sequence of characters registered as a domain 
name through the Internet cascading directory scheme).  And it surely doesn't 
satisfy the requirement for it to refer to an accessible resource.  So it is 
likely to be *syntactically invalid* and it has *no valid interpretation*.    (025)

[I would point out that this is precisely the basis on which I challenge the 
W3C opinion, i.e., that of its founder, that http://<domain name>/ is an 
excellent way to begin URIs for arbitrary purposes.  It is only a good way to 
construct URIs for network-accessible resources, because that is what it was 
defined to do and to mean.  And every Internet tool has a right to assume that 
a URI that begins http: means what RFC 2616 says it means.]    (026)

You may be less than satisfied with the existing IETF standards meant to 
support the emerging URI needs.  But you don't get to invent uses and 
interpretations that the standards don't support and were not intended to 
support.    (027)

If you want to fix the problems, you have to go to IETF and fix the standards. 
  And that means you must have a better idea, with the consequences analyzed, 
and a lot of friends who also want to solve the problem in exactly that way. 
That is how Internet standards evolve.  And after a few early stumbles, that 
mechanism has worked for 23 years, and created the opportunity for the 
Internet we know.    (028)

-Ed    (029)


-- 
Edward J. Barkmeyer                        Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                FAX: +1 301-975-4694    (030)


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (031)

<Prev in Thread] Current Thread [Next in Thread>