[Top] [All Lists]

Re: [ontolog-forum] Globally unique definitions (was tasteful tags)

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Jack Park <jack.park@xxxxxxx>
Date: Sat, 10 Mar 2007 07:58:35 -0800
Message-id: <45F2D5AB.4040306@xxxxxxx>
Perhaps this thread should have been called "Globally unique 
identifiers" or maybe "Ways to identify things".    (01)

Consider this:  visit
to see what happens when you use a name for something as an identifier. 
It seems, these days, that people are starting to use Wikipedia URLs as 
URIs, akin to saying, "Go here <wikipedia url> to see what I'm talking 
about" (not unlike footnotes as identity helpers).  Back to the 
Wikipedia page. Notice what happened when Game_Theory (the rock band) 
came along, and Game_Theory (the board game) came along. Names for 
things collapsed and ad-hock solutions (not bad ones at that!) were 
invented. There's a page identified with Game_Theory(disambiguate). 
And, it's mentioned on each possible ambiguous hit around Game_Theory 
just to let you know that if this page doesn't satisfy, check "here".    (02)

I don't have the historical background to say precisely when back of the 
book indexes were invented, but we all use them; Amazon puts them up 
(for some books) so you can get a good feel of the subjects to be found 
in the book. It's not unreasonable to grant each of those subjects a 
unique identifier, one perhaps coined with the book's title in it. This 
means that the subject that carries the name, say, "President Bush" 
itself might have a URI, it's still going to be ambiguous. Which 
President Bush would it identify? Could go so far as to say more, e.g. 
President_Bush_1 (as compared to President_Bush_2, but those URIs would 
not necessarily be useful to humans.    (03)

Nobody I know argues that URIs are not extremely valuable. But, there 
are two contexts for their use, one being machine-machine interactions, 
and another being human-machine interactions. As I understand things, 
the semantic web is a support framework for machine-machine interactions 
and, in that context, URIs should be supported.  It strikes me that 
several types of mechanisms exist or are being created to deal with the 
possibility that some particular subject, say, President Bush 1, might 
actually have more than one URI assigned to him; to presume otherwise 
would also presume an identity server in place, one that every machine 
and human on the planet uses when deriving new URIs. That is to say, an 
identity server, one strong enough to satisfy all possible users and 
subjects.    (04)

I will state that I strongly believe that the URI, or is it now the IRI 
system being used for RDF artifacts throughout the semweb ocmmunity 
represents a sound practice. I will argue, however, that it opens the 
door to ambiguity. I believe that ambiguity interferes with 
productivity, and with eventual merging of topics for any reason. If the 
terrorism ontologies of the world use, say, rdf:ID="osama_bin_laden" to 
name a particular person instance, then that ontology is at risk of 
being dangerous when someone else using that same name comes along.    (05)

I suspect that there is going to be a need for a global identifier 
generator, one that resolves ambiguity before it can happen. For that to 
work, URIs (IRIs) will likely *not* contain names; perhaps they will be 
closer to the random hex/integer strings in GUUID values. To do that, 
there will need to be a zillion different ways of actually mapping that 
new URI to the specific subject in question. Tribes have different ways 
of identifying their artifacts, so, such an identity server will 
necessarily need the ability to provide for the many possible ways to 
identify specific subjects, and ambiguity is still going to happen. 
Here, the mechanism suggested is one of a possibly institutionalized 
identity server, and that server, I would argue, is simply a subject map.    (06)

Even large-scale conceptual graphs will have this same issue raised from 
time to time. While all our literature appeals to the toy problems 
associated with early AI, e.g. "The cat sat on the mat", clearly, "Bush 
spoke with Putin" is potentially, even dangerously ambiguous. Forgive 
the bluntness in this, but we cannot continue using toy scenarios as we 
conduct our research and development moving forward. If we are to make 
the progress in, say, climate studies and cancer research that we claim 
we wish to make, there will be less room for balkanized ontologies, and 
more demand for ontologies that server larger purposes, merge into other 
domains (synergy, chance discovery), and are also usable in the 
human-machine interaction context.    (07)

We don't have to wait for an institutionalized, quite possibly 
balkanized subject identity server; we can simply start by crafting 
subject maps into the web as we go. When someone uses a URI/IRI in their 
rdf, just make available a mapping that provides enough properties 
associated with that URI/IRI to disambiguate it from others.    (08)

To ignore this, I would argue, is to balkanize the web. Patrick Durusau 
and I spoke about such matters to this tribe already [1]. We argue that 
subject maps, even ones written in OWL, offer a simple means to 
disambiguate the web and without resort to an institutionalized, central 
identity server. Simply represent what you mean by the use of some URI 
in a self-documenting way such that reasoning engines can make 
distinctions. We call that subject mapping.    (09)

[1] http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2006_04_27    (010)

Pat Hayes wrote:
>> Pat,
>> The global postal address (name, street, city, country)
>> was adequate as a globally unique id for data transmission
>> (snail mail) for centuries.
> Only for about two centuries, and only in some countries. It didn't 
> work in China until well after WW2, for example, and there are still 
> many areas of the world where it does not work.  But in any case, 
> this kind of quibbling is silly: we are clearly talking about 
> computerized networks.
>>   And we've had globally unique
>> ids for computer-like usage since the 1930s (Social Security
>> Numbers on punched-card machines).
> A SS number isn't going to do you much good outside the USA. Large 
> though it is, the US is less than 5% of the world's total population 
> (300 million in 6.7 billion)
>>  They also supported data
>> storage and transmission (by putting cards on a truck and
>> shipping them to globally unique addresses).
> Again, not a lot of use across continental distances.
>> These techniques may sound primitive by today's standards,
>> but there is nothing in principle that is different about
>> URLs except speed and convenience.
> I profoundly disagree. The chief difference in principle is that the 
> transmission can be done without human aid, and at near-instantaneous 
> speeds, and is now genuinely world-wide.
>>  On a related note, see
>> the proposed standard for TCP/IP links via carrier pigeon:
>>    http://www.ietf.org/rfc/rfc1149.txt
>> That may be jocular, but it could work.
>> JFS>> With the advent of computer systems, programming tools
>>>>  forced those "footnotes" to be made explicit, and they
>>>>  have done so remarkably well since the 1950s.
>> PH> I guess I don't follow what you mean here.
>> The globally unique ids such as addresses and SSN's were
>> put on the computer.  By piling up enough of them, you
>> could get a unique key for anything in the world you wanted
>> to designate.
> No, you couldn't. But even if you were right, what has this got to do 
> with footnotes?
>> That technique has been used in database
>> systems for over 40 years.
>> Admittedly, the methods were not uniform across all
>> implementations, they've been working for years.
>> PH> There were no global references before the Web.
>> No.  See above.
>> PH> Well, you can do things like this with redirects and so on.
>>>  People often don't bother in simple cases, but there is a
>>>  lot of PHP out there. For example, my URI
>> But that involves building an ad hoc redirecting kludge on top
>> of a global system instead of building an ad hoc global kludge
>> on top of a local system.
> It is not an ad hoc kludge. Redirection is part of the basic 
> machinery of the internet, and has been since day one. Well, maybe 
> day two. Read Roy Fielding's thesis.
>> JFS> I still remember the early days of PCs, in which names
>>>  of files and disk drives were hard coded in the programs
>>>  and people complained about those hard coded connections.
>>>  They pleaded for greater flexibility by parameterizing
>>>  the connections.  But now the SemWeb has returned to the
>>>  good (?) or bad (?) old days with hard coded connections.
>> PH> Not at all. In fact nothing could be further from the truth,
>>>  cf http://www.w3.org/Provider/Style/URI
>> That reference just recommends some naming guidelines.  What I
>> was talking about is a context-dependent reference system, which
>> could be dynamically linked into various contexts.
> So was the reference cited.
>>  That is the
>> typical programming-library notion that programmers have been
>> using since the early '60s.
>> For example, there are two fundamentally different ways of
>> linking a term, say "vehicle", into an environment:
>>  1. Globally unique: a fixed definition that is independent
>>     of any other definition for "vehicle" in any environment.
>>  2. Context dependent: the term "vehicle" is linked to whatever
>>     definition for "vehicle" is used in the current environment.
>> Method #1 is the default for the WWW, and method #2 must be
>> supported by some ad hoc kludge.
> Really, John, you should find out more about the actual Web 
> architecture before saying such crazy things. URIs are globally 
> unique. But that says nothing about global uniqueness of 
> *definitions*, and a URI is not an address, it is an identifier. The 
> process of getting from the URI to the 'place' where the identified 
> resource is located - the http protocol specification - can be very 
> complicated and intricate, and can easily accommodate your method #2. 
> And this is not an ad hoc kludge, it is part of the most basic 
> protocol definition of the entire Web, the process by which 
> information is transferred across the entire internet; and in fact it 
> is widely used in large professional websites.
>>  Method #2 is the default for
>> programming libraries, and method #1 is supported by a variety
>> of different methods (including the option of using a URI).
>> And I'd also like to mention a third approach:
>>  3. Unique, but mobile:  a globally unique id, which allows
>>     agents to move around the WWW while retaining their own
>>     identity, independent of the computer (or device or cell
>>     phone) on which it happens to be located
> Yes, URIs can do that, but not ...
>> -- and without
>>     any assistance (or even recognition) from the sender or
>>     the domain name servers of the Internet.
> ...this. Nor should they. Why would you want any Internet-based 
> communication process to be done without assistance from the Internet 
> servers? Seems crazy to me.
>> Method #3 can be simulated by implementing a virtual network
>> on top of the Internet, and I'm sure that such things will
>> proliferate over time.
> Well, there have been many such schemes. People at IHMC have one, 
> which supports the CmapTools server/client system with a worldwide 
> distribution. And like all such schemes, it is extremely hard to 
> maintain, less efficient than simply using the Web, and doomed to 
> ultimate failure. The fact is that all such schemes are inevitably 
> overtaken by the exponents in Moore's Law within a few years. Its 
> far, far cheaper to just use the actual Web, as well as being more 
> robust.
>> All of these are important naming schemes, and they should all
>> be supported by some systematic scheme instead of being built
>> by an ad hoc kludge (such as a special-purpose PHP program).
> By your conventions, any programming counts as an ad-hoc kludge. PHP 
> is widely used, and so are several other redirecting systems. They 
> all work within the overarching architectural and semantic specs 
> which define the Web. They *work*, to repeat. Why are they kludges??
> Pat
>> John
>     (011)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (012)

<Prev in Thread] Current Thread [Next in Thread>