ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] {Disarmed} Re: OWL and lack of identifiers

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>, <edbark@xxxxxxxx>
From: Duane Nickull <dnickull@xxxxxxxxx>
Date: Fri, 13 Apr 2007 19:19:03 -0700
Message-id: <C2458827.C30F%dnickull@xxxxxxxxx>
Wrong:

Thre things are needed to retrieve a resource:

  1. the location
  2. the protocol and methodology required
  3. a unique identifier

inline


On 4/13/07 6:30 PM, "Cory Casanave" <cory-c@xxxxxxxxxxxxxxxxxxxxxxx> wrote:

Also the URI mechanism has all that is needed to distinguish resources from identities.  For some reason we tend to use the web protocol "HTTP" where as this makes no sense for a pure identity.  We could substitute any protocol name in a URI to distinguish logical resources, such as:

"identity://cim3.net/MyCat" (A pure identity)

DN – this identifies the resource but is not unique and has no retrieval mechanism involved.

Vs.
"http://www.cim3.net/CIM3_Executive_Brief_files/frame.htm" (A real resource)

DN: his uniquely identifies the resources and also declares the protocol and methodology to get a serialization of the resource.

 
While there is no standard for "identity" it can be used without a problem since we are not expecting to utilize it as an internet protocol.

DN:
URL’s are used in namespace values since they are guaranteed to be unique given the DNS methodology.

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Ken Laskey
Sent: Friday, April 13, 2007 7:27 PM
To: edbark@xxxxxxxx
Cc: [ontolog-forum]
Subject: Re: [ontolog-forum] {Disarmed} Re: OWL and lack of identifiers

Just to be clear, from RFC 2396:

A Uniform Resource Identifier (URI) is a compact string of characters
for identifying an abstract or physical resource.

A resource can be anything that has identity. Familiar
examples include an electronic document, an image, a service
(e.g., "today's weather report for Los Angeles"), and a
collection of other resources. Not all resources are network
"retrievable"; e.g., human beings, corporations, and bound
books in a library can also be considered resources.


Thus, anything that can be identified is a resource (i.e., you can use everything for something) and URIs are one means (and one that has been found very useful) for providing that identity.

Ken


On Apr 13, 2007, at 3:10 PM, Ed Barkmeyer wrote:


Ken Laskey wrote:
 

 


When the URI is a reference to a Web page (full  stop), the resource is the web page, and by extension, the information  content of the web page.

I think of the page and its information content as  being separate.


 
>From an ontological point of view, I may also want to  distinguish the content from its external representation, if that was your  point. But the Web does not make that  distinction. Put another way, the Web  consciously manages external representations of information, and leaves the  abstraction of content to the reader.  The whole idea of the Semantic Web is to provide standard external  representations for some orderly abstraction of content, in order to  facilitate search.
 

 
I find it important to distinguish the location of  the information from its content, which was my point. So perhaps we are talking past each  other.
 

 
But the definition of URI (IETF RFC 2396) says it  identifies a "resource".
 

 

For example, I can make statements about the style  of the page display, the server where the <html> tags reside, the  provenance information for the page.  These are all separate from the information content of the  page.


 
We have now identified several distinguishable  concepts:
 
1) the  place
 
2) the  presentation structure (web page)
 
3) the  information content
 
4) a formal  description of the content
 
5) the  "provenance metadata" for the content
 
6) the  provenance metadata for the presentation
 
7) the  provenance metadata for the presentation in that place
 

 
And we could easily make a model (ontology) for these  things and their relationships:
 
place(1)  conveys presentation(2)
 
presentation(2) conveys content(3)
 
content(3)  has formal description(4)
 
content(3)  has provenance of content(5)
 
presentation(2) has provenance of  presentation(6)
 
place(1) has  provenance of site content(7)
 

 
Further we note that there are other  possibilities. In particular,
 
place(1)  provides service(8)
 
service(8)  permits access to presentation(2)
 

 
RFC 2396 is pretty clear that a URL identifies a  place(1) full stop, and indicates a means of access to whatever is at that  place. From our would-be ontology  above, what is thus addressed is either a presentation/document or a  service.
 

 
By comparison, RFC 2396 says that a URI identifies a  "resource". And all of  (2),(4),(5),(6),(7) and the service (8) are distinct resources that may be  found at the *same site*. (I think  the Web view is that content(3) is only accessible through its  presentation(2).) It follows that  each of them should have a distinct URI.  Those URIs may be distinct URLs in their own right, or they may all  incorporate a common URL and each have a distinct fragment identifier.
 

 
Since a URL always identifies a place, if the  distinct resources have distinct URLs, our model above needs some  additions:
 
place(1)  conveys formal description(4)
 
place(1)  conveys provenance of content(5)
 
place(1)  conveys provenance of presentation(6)
 
place(1)  conveys provenance of site content(7)
 

 
One place can convey some or all of  (2),(4),(5),(6),(7),(8), but when one place conveys more than one of them,  each has a distinct URI whose "fragment identifier" distinguishes the  "component". And by convention, in  those cases, the URI with no fragment identifier (the simple URL) conveys  either (2) or (8). It is also  possible that we have a (9), which is a web page that is a container for  (2),(4),(5),(6),(7), delivered as a single resource.
 

 
Note that our model is starting to get rather  messy.
 
This is why Tim Burners-Lee says you need to impose  some discipline on your site. The  problem is that several different conventions have emerged (including not  imposing any discipline), and there are no reference standards.
 

 
In a somewhat different vein, I wrote:
 

 


I have argued with TBL before that URIs that are  URLs confuse WHAT something is with WHERE it is. And it is only an acceptable idea when  that relationship is required to be 1-to-1. The idea of identifiers is that you  can test for equal. When the same  thing can be in multiple places, unequal doesn't tell me anything, which  is annoying, especially when tools think unequal to the expected value  means unusable. And when the same  place can hold different things, equal doesn't tell me anything, which  defeats the purpose.


 
Ken says:
 

What you are saying is it doesn't serve the purpose  you have in mind, not that it doesn't serve other purposes quite well. One could say the success of the Web  shows a real value.


 
Whoa! I  fully agree that URLs locate lots of useful and functionally different things,  just as postal addresses do. But if  today it's a bank and tomorrow it's a laundry or a residence or a casino, what  "resource" is being "identified"?
 

 
What I said was that if the content to which a URI  refers changes radically from day to day, the URI doesn't identify "an  information resource" in any useful sense.  And thus the idea that the URI identifies something different from a  location is false. If the purpose of  a URI is to denote content, function, behavior, as distinct from location,  some one of those has to be consistent over time. A bulletin board and a pulpit are just  locations.
 

 


(I wonder how many XML tools would break if the  namespace URL for XML Schema pointed to a local copy of the  specification... Is the W3C URI  THE name or A name for the XML Schema specification?)

This is where provenance comes in. It is THE URI if you believe W3C to be  the authoritative source.


 
This confuses two ideas:
 
1. The location of the document
 
2. The identity of the document as the one  issued by the authoritative source.
 

 
Example: The  authoritative source for the Oxford Dictionary of English is presumably in  Oxford, England, but I can find the document at my public library.
 

 
All of the copies of the ODE have the same  designation, but you can find copies in lots of places. So if I point you to a place where you can  find it, that has nothing to do with the authoritative source.
 

 
But my example was wrong. The xmlns reference is to the "namespace  URI", which is the required *identifier* for the specification. The tool is free to get a copy from  anywhere it likes. So if I put  another URL there, it may be a location of a copy of the specification, but it  is NOT the *identifier*, and the tool should fail. It is exactly as if I referred to the  "Peoria Public Library's dictionary" instead of the ODE.
 

 


The webhead idea is that you will always go to  the URL, fetch the resource, and use it.  The idea that a tool has been pre-programmed to support that  *content*, and, in conducting a web-based transaction, this might require  the tool to fetch and compare two 10MB files to determine whether they are  *versions of* the same specification, is beyond their hobbyist view of the  Internet.

So what metadata do you need in place to support  your use? How do you want to create  and maintain that metadata? Will  you make it available for others to use?


 
Ah, now we are talking about what "responsible  management" of referenceable resources might be. This is the kind of discipline that the  WebDAV folks have worked on, and there is a "widely accepted" scheme for life  cycle management of documents. The  trouble is that it is widely accepted among the various organizations involved  in making document and metadata standards, but those folks operate and  influence less than 1% of websites.  It does mean that publishers, and standards organizations, and library  websites will probably use it.
 

 

Everything is a resource to someone, as it should  be. What we want to be able to do  is differentiate resources so we use the one(s) most suitable for our  needs.


 
Exactly. But  unless there are common conventions for that differentiation, all we have is a  bunch of disorganized resources labeled according to hundreds or thousands of  incompatible schemes, most of which are not very good or very useful. Google has built a successful enterprise  on the failure of the Web, and its principal resources, to address that  problem. And there are many who  believe that that also is as it should be.
 

 
IMO, the problem is that Internet is still the big  city of the Middle Ages. We know how to build all kinds of buildings and we  have a lot of demand for them and a lot of construction of various kinds and  qualities going on. But no one is  responsible for much of it, we have no civil engineering discipline, we have  no land use planning, we have random patchworks of streets, we are carrying  the water on foot in buckets from the most convenient well, we have no police  force and no fire brigade, we have sewage problems, crime problems and  frequent plagues. Some communities  thrive and some die out, and we don't really understand why. And yet people keep coming here, because  there is education, and jobs, and entertainment, and money to be made. Ultimately, technology enabled us to get  control of it, and fires and plagues forced us to. But it took 7 centuries. I hope the Internet experience is  shorter.
 

 
-Ed
 

 


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>