[Top] [All Lists]

Re: [ontolog-forum] Globally unique definitions (was tasteful tags)

To: Jack Park <jack.park@xxxxxxx>
Cc: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Pat Hayes <phayes@xxxxxxx>
Date: Mon, 12 Mar 2007 02:21:06 -0600
Message-id: <p06230906c21ab3c44ab3@[]>
>Perhaps this thread should have been called "Globally unique
>identifiers" or maybe "Ways to identify things".
>Consider this:  visit
>to see what happens when you use a name for something as an identifier.
>It seems, these days, that people are starting to use Wikipedia URLs as
>URIs, akin to saying, "Go here <wikipedia url> to see what I'm talking
>about" (not unlike footnotes as identity helpers).  Back to the
>Wikipedia page. Notice what happened when Game_Theory (the rock band)
>came along, and Game_Theory (the board game) came along. Names for
>things collapsed and ad-hock solutions (not bad ones at that!) were
>invented. There's a page identified with Game_Theory(disambiguate).
>And, it's mentioned on each possible ambiguous hit around Game_Theory
>just to let you know that if this page doesn't satisfy, check "here".    (01)

All of which illustrates the failure of 
traditional NL human names when placed in a wider 
context than the one in which they are coined. 
URIs however are created in a global context, and 
can always be traced back to a 'source'. It might 
not be a very useful or informative source, but 
it is there and it is unique, which is a start.    (02)

>I don't have the historical background to say precisely when back of the
>book indexes were invented, but we all use them; Amazon puts them up
>(for some books) so you can get a good feel of the subjects to be found
>in the book. It's not unreasonable to grant each of those subjects a
>unique identifier, one perhaps coined with the book's title in it. This
>means that the subject that carries the name, say, "President Bush"
>itself might have a URI, it's still going to be ambiguous. Which
>President Bush would it identify? Could go so far as to say more, e.g.
>President_Bush_1 (as compared to President_Bush_2, but those URIs would
>not necessarily be useful to humans.
>Nobody I know argues that URIs are not extremely valuable. But, there
>are two contexts for their use, one being machine-machine interactions,
>and another being human-machine interactions. As I understand things,
>the semantic web is a support framework for machine-machine interactions
>and, in that context, URIs should be supported.  It strikes me that
>several types of mechanisms exist or are being created to deal with the
>possibility that some particular subject, say, President Bush 1, might
>actually have more than one URI assigned to him    (03)

He might well. That isn't a problem or a source 
of ambiguity. The Web architecture explicltly 
allows multiple different IRIs to refer to the 
same resource.    (04)

>; to presume otherwise
>would also presume an identity server in place, one that every machine
>and human on the planet uses when deriving new URIs. That is to say, an
>identity server, one strong enough to satisfy all possible users and
>I will state that I strongly believe that the URI, or is it now the IRI
>system being used for RDF artifacts throughout the semweb ocmmunity
>represents a sound practice. I will argue, however, that it opens the
>door to ambiguity.    (05)

Can you say why? It seems to me that while IRIs 
are of course ambiguous as referring names 
(virtually all referring names are ambiguous), 
they aren't any MORE ambiguous than other names; 
and they are globally *unique*, which at least 
does cut down one possible reason for ambiguity. 
So I wonder why you say 'open the door' here.    (06)

>I believe that ambiguity interferes with
>productivity    (07)

I wonder why people are so worried about 
ambiguity? Its like being concerned that there is 
too much nitrogen in the atmosphere. Ambiguity is 
inevitable, its useful, and there is no way at 
all to get rid of it. We should learn to use it 
constructively, just as human beings (or maybe 
human nervous systems) have done since there has 
been human language.    (08)

>, and with eventual merging of topics for any reason. If the
>terrorism ontologies of the world use, say, rdf:ID="osama_bin_laden" to
>name a particular person instance, then that ontology is at risk of
>being dangerous when someone else using that same name comes along.    (09)

Don't worry. If the terrorist organizations were 
this stupid we would have beaten them long ago.    (010)

>I suspect that there is going to be a need for a global identifier
>generator, one that resolves ambiguity before it can happen.    (011)

It will no doubt be made of kryptonite.    (012)

>For that to
>work, URIs (IRIs) will likely *not* contain names; perhaps they will be
>closer to the random hex/integer strings in GUUID values.    (013)

We seem to be at cross purposes. URIs and IRIs 
*are* names. Long, inhuman names, but names for 
all that. They are names because they are simple 
linguistic entities which refer. And BTW, URIs 
often do contain long numerical strings: ever 
look at the URIs that encode a Google query or 
identify a Flickr image?    (014)

>  To do that,
>there will need to be a zillion different ways of actually mapping that
>new URI to the specific subject in question.    (015)

Which is obviously impossible. What the Web can 
do is to link URIs to whatever it is on the Web 
that they identify, but no single mechanism can 
possibly take one for a name to its referent.    (016)

>Tribes have different ways
>of identifying their artifacts    (017)

Names refer to other things than artifacts. There 
are naming schemes for galaxies, for just one 
example.    (018)

>, so, such an identity server will
>necessarily need the ability to provide for the many possible ways to
>identify specific subjects, and ambiguity is still going to happen.
>Here, the mechanism suggested is one of a possibly institutionalized
>identity server, and that server, I would argue, is simply a subject map.
>Even large-scale conceptual graphs will have this same issue raised from
>time to time. While all our literature appeals to the toy problems
>associated with early AI, e.g. "The cat sat on the mat", clearly, "Bush
>spoke with Putin" is potentially, even dangerously ambiguous. Forgive
>the bluntness in this, but we cannot continue using toy scenarios as we
>conduct our research and development moving forward.    (019)

I entirely agree.    (020)

>If we are to make
>the progress in, say, climate studies and cancer research that we claim
>we wish to make, there will be less room for balkanized ontologies, and
>more demand for ontologies that server larger purposes, merge into other
>domains (synergy, chance discovery), and are also usable in the
>human-machine interaction context.
>We don't have to wait for an institutionalized, quite possibly
>balkanized subject identity server; we can simply start by crafting
>subject maps into the web as we go. When someone uses a URI/IRI in their
>rdf, just make available a mapping that provides enough properties
>associated with that URI/IRI to disambiguate it from others.    (021)

But surely, this is exactly what the RDF or OWL is supposed to do.    (022)

>To ignore this, I would argue, is to balkanize the web. Patrick Durusau
>and I spoke about such matters to this tribe already [1]. We argue that
>subject maps, even ones written in OWL, offer a simple means to
>disambiguate the web and without resort to an institutionalized, central
>identity server. Simply represent what you mean by the use of some URI
>in a self-documenting way such that reasoning engines can make
>distinctions. We call that subject mapping.    (023)

Everyone else calls it writing good ontologies :-)    (024)

Pat Hayes    (025)

>[1] http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2006_04_27
>Pat Hayes wrote:
>>>  Pat,
>>>  The global postal address (name, street, city, country)
>>>  was adequate as a globally unique id for data transmission
>>>  (snail mail) for centuries.
>>  Only for about two centuries, and only in some countries. It didn't
>>  work in China until well after WW2, for example, and there are still
>>  many areas of the world where it does not work.  But in any case,
>>  this kind of quibbling is silly: we are clearly talking about
>>  computerized networks.
>>>    And we've had globally unique
>>>  ids for computer-like usage since the 1930s (Social Security
>>>  Numbers on punched-card machines).
>>  A SS number isn't going to do you much good outside the USA. Large
>  > though it is, the US is less than 5% of the world's total population
>>  (300 million in 6.7 billion)
>>>   They also supported data
>>>  storage and transmission (by putting cards on a truck and
>>>  shipping them to globally unique addresses).
>>  Again, not a lot of use across continental distances.
>>>  These techniques may sound primitive by today's standards,
>>>  but there is nothing in principle that is different about
>>>  URLs except speed and convenience.
>>  I profoundly disagree. The chief difference in principle is that the
>>  transmission can be done without human aid, and at near-instantaneous
>>  speeds, and is now genuinely world-wide.
>>>   On a related note, see
>>>  the proposed standard for TCP/IP links via carrier pigeon:
>>>     http://www.ietf.org/rfc/rfc1149.txt
>>>  That may be jocular, but it could work.
>>>  JFS>> With the advent of computer systems, programming tools
>>>>>   forced those "footnotes" to be made explicit, and they
>>>>>   have done so remarkably well since the 1950s.
>>>  PH> I guess I don't follow what you mean here.
>>>  The globally unique ids such as addresses and SSN's were
>>>  put on the computer.  By piling up enough of them, you
>>>  could get a unique key for anything in the world you wanted
>>>  to designate.
>>  No, you couldn't. But even if you were right, what has this got to do
>>  with footnotes?
>>>  That technique has been used in database
>>>  systems for over 40 years.
>>>  Admittedly, the methods were not uniform across all
>>>  implementations, they've been working for years.
>>>  PH> There were no global references before the Web.
>>>  No.  See above.
>>>  PH> Well, you can do things like this with redirects and so on.
>>>>   People often don't bother in simple cases, but there is a
>  >>>  lot of PHP out there. For example, my URI
>>>  But that involves building an ad hoc redirecting kludge on top
>>>  of a global system instead of building an ad hoc global kludge
>>>  on top of a local system.
>>  It is not an ad hoc kludge. Redirection is part of the basic
>>  machinery of the internet, and has been since day one. Well, maybe
>>  day two. Read Roy Fielding's thesis.
>>>  JFS> I still remember the early days of PCs, in which names
>>>>   of files and disk drives were hard coded in the programs
>>>>   and people complained about those hard coded connections.
>>>>   They pleaded for greater flexibility by parameterizing
>>>>   the connections.  But now the SemWeb has returned to the
>>>>   good (?) or bad (?) old days with hard coded connections.
>>>  PH> Not at all. In fact nothing could be further from the truth,
>>>>   cf http://www.w3.org/Provider/Style/URI
>>>  That reference just recommends some naming guidelines.  What I
>>>  was talking about is a context-dependent reference system, which
>>>  could be dynamically linked into various contexts.
>>  So was the reference cited.
>>>   That is the
>>>  typical programming-library notion that programmers have been
>>>  using since the early '60s.
>>>  For example, there are two fundamentally different ways of
>>>  linking a term, say "vehicle", into an environment:
>>>   1. Globally unique: a fixed definition that is independent
>>>      of any other definition for "vehicle" in any environment.
>>>   2. Context dependent: the term "vehicle" is linked to whatever
>>>      definition for "vehicle" is used in the current environment.
>>>  Method #1 is the default for the WWW, and method #2 must be
>>>  supported by some ad hoc kludge.
>>  Really, John, you should find out more about the actual Web
>>  architecture before saying such crazy things. URIs are globally
>>  unique. But that says nothing about global uniqueness of
>>  *definitions*, and a URI is not an address, it is an identifier. The
>>  process of getting from the URI to the 'place' where the identified
>>  resource is located - the http protocol specification - can be very
>>  complicated and intricate, and can easily accommodate your method #2.
>>  And this is not an ad hoc kludge, it is part of the most basic
>>  protocol definition of the entire Web, the process by which
>>  information is transferred across the entire internet; and in fact it
>  > is widely used in large professional websites.
>>>   Method #2 is the default for
>>>  programming libraries, and method #1 is supported by a variety
>>>  of different methods (including the option of using a URI).
>>>  And I'd also like to mention a third approach:
>>>   3. Unique, but mobile:  a globally unique id, which allows
>>>      agents to move around the WWW while retaining their own
>>>      identity, independent of the computer (or device or cell
>>>      phone) on which it happens to be located
>>  Yes, URIs can do that, but not ...
>>>  -- and without
>>>      any assistance (or even recognition) from the sender or
>>>      the domain name servers of the Internet.
>>  ...this. Nor should they. Why would you want any Internet-based
>>  communication process to be done without assistance from the Internet
>>  servers? Seems crazy to me.
>>>  Method #3 can be simulated by implementing a virtual network
>>>  on top of the Internet, and I'm sure that such things will
>>>  proliferate over time.
>>  Well, there have been many such schemes. People at IHMC have one,
>>  which supports the CmapTools server/client system with a worldwide
>>  distribution. And like all such schemes, it is extremely hard to
>>  maintain, less efficient than simply using the Web, and doomed to
>>  ultimate failure. The fact is that all such schemes are inevitably
>>  overtaken by the exponents in Moore's Law within a few years. Its
>>  far, far cheaper to just use the actual Web, as well as being more
>>  robust.
>>>  All of these are important naming schemes, and they should all
>>>  be supported by some systematic scheme instead of being built
>>>  by an ad hoc kludge (such as a special-purpose PHP program).
>>  By your conventions, any programming counts as an ad-hoc kludge. PHP
>  > is widely used, and so are several other redirecting systems. They
>>  all work within the overarching architectural and semantic specs
>>  which define the Web. They *work*, to repeat. Why are they kludges??
>>  Pat
>>>  John
>Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/ 
>Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/ 
>Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
>Shared Files: http://ontolog.cim3.net/file/
>Community Wiki: http://ontolog.cim3.net/wiki/
>To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>    (026)

IHMC            (850)434 8903 or (650)494 3973   home
40 South Alcaniz St.    (850)202 4416   office
Pensacola                       (850)202 4440   fax
FL 32502                        (850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes    (027)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (028)

<Prev in Thread] Current Thread [Next in Thread>