[Top] [All Lists]

Re: [ontolog-forum] ambiguity interferes with

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Jack Park <jack.park@xxxxxxx>
Date: Sat, 10 Mar 2007 17:45:54 -0800
Message-id: <FBDF150E-B363-45DD-B149-1983B8EE020B@xxxxxxx>

Thank you for the questions. At this time in history, I can point you to the IRS Tax Map [1,2], crafted by Coolheads Consulting, which is Steve Newcomb and Michel Bizunski, both men responsible for bringing topic maps to the table late in the last century (I often wondered what it would feel like saying that. Now I know...). I suppose I could speak to the vast ambiguity that exists in our tax codes, and suggest that perhaps their topic map fixed all that, but I have no particular experience with it. Perhaps Steve Newcomb, who contributes to Ontolog, could offer his own views.  I think that's about all I can offer at this time regarding specific applications that respond to your inquiry, though I'll go out on a limb and suggest there may be many others.  

In the "forthcoming" department, consider "fuzzzy.com" [3], a masters thesis project that layers a social bookmarking application on top of a topic map engine. That's a start at dealing with the complexities associated with the evolution of folksonomies, home-grown tags for information resources. My own TopicSpaces subject map provider is being used in a project called Tagomizer, another social bookmarking application wrapped around a subject map engine. The explicit use case is that of social accretion of project-centric information resources; the SRI program CALO is able to use web services to query the subject map for tagged information resources related to specific projects in which CALO users are engaged. This means that CALO users are encouraged to roam the web in search of resources specific to their needs, and then tag them accordingly. Those tags make the resources available to other CALO users. That project is up and running and being expanded at this time. Another TopicSpaces project under way is creating a subject map portal for a primate research center. Enough on my stuff. Others are out there.

It seems to me that the lion's share of funding has gone into RDF and semantic web technologies, none of which presently appear to include that which topic or subject maps bring to the table. It is hard to imagine how so-called "competing technologies" (which is a gross misnomer, but still a valid perception) can penetrate such a funding environment. I am now seeing evidence that topic/subject maps living in their own domain, which is primarily indexical, can make useful contributions.


On Mar 10, 2007, at 10:57 AM, Deborah MacPherson wrote:

RE: ...ambiguity interferes with productivity, and with eventual merging of topics for any reason.


...point to unique definitions at different nodes


...sliding scale of what is obvious, clever, or obscure.

Ambiguity also interferes with the dissemination or evaluation of truth. Unless information is from a trusted source, we are unfortunately missing a sliding scale browser tool to adjust the amount of spin attached to information being broadcast or received. There are lots of emporers out there with no clothes but as soon as the appropriate terminology is used, potentially incorrect digital materials are paraded in front of unsuspecting users. However, if the subject maps Jack Park is talking about were implemented, and there was

...less room for balkanized ontologies, and more demand for ontologies that serve larger purposes, merge into other domains (synergy, chance discovery).

Ambiguities could instigate show downs between two or more sides of the same story if opposing versions got forced into the same area. Even the most untrained viewer could see the diversity and depth of information they may be dealing with using this or that word as a search term. Not one column of results a'la Google, two or more right next to each other comparing "unique definitions at different nodes" pitted directly against each other. Truth probably lies somewhere in between.

It does not seem to be human nature to want to adopt other people's definitions for their favorite words - perhaps ambiguities are not as distracting for machines as they can be for us. There has to be some advantage that machines do not have passionate attachments to "what things mean". Subject maps made by them, even if they were clunky and not beautiful like some cartographic maps,  would be interesting to see.

Mr. Park can you please point to some examples of subject maps that are accurate, appealing, and broad?

Do you know of any automated attempts to force many MANY versions of the same ambiguities together to be examined at critical mass in a shared semantic spaces?

Debbie MacPherson

On 3/10/07, Jack Park < jack.park@xxxxxxx> wrote:
Perhaps this thread should have been called "Globally unique
identifiers" or maybe "Ways to identify things".

Consider this:  visit
to see what happens when you use a name for something as an identifier.
It seems, these days, that people are starting to use Wikipedia URLs as
URIs, akin to saying, "Go here <wikipedia url> to see what I'm talking
about" (not unlike footnotes as identity helpers).  Back to the
Wikipedia page. Notice what happened when Game_Theory (the rock band)
came along, and Game_Theory (the board game) came along. Names for
things collapsed and ad-hock solutions (not bad ones at that!) were
invented. There's a page identified with Game_Theory(disambiguate).
And, it's mentioned on each possible ambiguous hit around Game_Theory
just to let you know that if this page doesn't satisfy, check "here".

I don't have the historical background to say precisely when back of the
book indexes were invented, but we all use them; Amazon puts them up
(for some books) so you can get a good feel of the subjects to be found
in the book. It's not unreasonable to grant each of those subjects a
unique identifier, one perhaps coined with the book's title in it. This
means that the subject that carries the name, say, "President Bush"
itself might have a URI, it's still going to be ambiguous. Which
President Bush would it identify? Could go so far as to say more, e.g.
President_Bush_1 (as compared to President_Bush_2, but those URIs would
not necessarily be useful to humans.

Nobody I know argues that URIs are not extremely valuable. But, there
are two contexts for their use, one being machine-machine interactions,
and another being human-machine interactions. As I understand things,
the semantic web is a support framework for machine-machine interactions
and, in that context, URIs should be supported.  It strikes me that
several types of mechanisms exist or are being created to deal with the
possibility that some particular subject, say, President Bush 1, might
actually have more than one URI assigned to him; to presume otherwise
would also presume an identity server in place, one that every machine
and human on the planet uses when deriving new URIs. That is to say, an
identity server, one strong enough to satisfy all possible users and

I will state that I strongly believe that the URI, or is it now the IRI
system being used for RDF artifacts throughout the semweb ocmmunity
represents a sound practice. I will argue, however, that it opens the
door to ambiguity. I believe that ambiguity interferes with
productivity, and with eventual merging of topics for any reason. If the
terrorism ontologies of the world use, say, rdf:ID="osama_bin_laden" to
name a particular person instance, then that ontology is at risk of
being dangerous when someone else using that same name comes along.

I suspect that there is going to be a need for a global identifier
generator, one that resolves ambiguity before it can happen. For that to
work, URIs (IRIs) will likely *not* contain names; perhaps they will be
closer to the random hex/integer strings in GUUID values. To do that,
there will need to be a zillion different ways of actually mapping that
new URI to the specific subject in question. Tribes have different ways
of identifying their artifacts, so, such an identity server will
necessarily need the ability to provide for the many possible ways to
identify specific subjects, and ambiguity is still going to happen.
Here, the mechanism suggested is one of a possibly institutionalized
identity server, and that server, I would argue, is simply a subject map.

Even large-scale conceptual graphs will have this same issue raised from
time to time. While all our literature appeals to the toy problems
associated with early AI, e.g. "The cat sat on the mat", clearly, "Bush
spoke with Putin" is potentially, even dangerously ambiguous. Forgive
the bluntness in this, but we cannot continue using toy scenarios as we
conduct our research and development moving forward. If we are to make
the progress in, say, climate studies and cancer research that we claim
we wish to make, there will be less room for balkanized ontologies, and
more demand for ontologies that server larger purposes, merge into other
domains (synergy, chance discovery), and are also usable in the
human-machine interaction context.

We don't have to wait for an institutionalized, quite possibly
balkanized subject identity server; we can simply start by crafting
subject maps into the web as we go. When someone uses a URI/IRI in their
rdf, just make available a mapping that provides enough properties
associated with that URI/IRI to disambiguate it from others.

To ignore this, I would argue, is to balkanize the web. Patrick Durusau
and I spoke about such matters to this tribe already [1]. We argue that
subject maps, even ones written in OWL, offer a simple means to
disambiguate the web and without resort to an institutionalized, central
identity server. Simply represent what you mean by the use of some URI
in a self-documenting way such that reasoning engines can make
distinctions. We call that subject mapping.

[1] http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2006_04_27

Pat Hayes wrote:
>> Pat,
>> The global postal address (name, street, city, country)
>> was adequate as a globally unique id for data transmission
>> (snail mail) for centuries.
> Only for about two centuries, and only in some countries. It didn't
> work in China until well after WW2, for example, and there are still
> many areas of the world where it does not work.  But in any case,
> this kind of quibbling is silly: we are clearly talking about
> computerized networks.
>>   And we've had globally unique
>> ids for computer-like usage since the 1930s (Social Security
>> Numbers on punched-card machines).
> A SS number isn't going to do you much good outside the USA. Large
> though it is, the US is less than 5% of the world's total population
> (300 million in 6.7 billion)
>>  They also supported data
>> storage and transmission (by putting cards on a truck and
>> shipping them to globally unique addresses).
> Again, not a lot of use across continental distances.
>> These techniques may sound primitive by today's standards,
>> but there is nothing in principle that is different about
>> URLs except speed and convenience.
> I profoundly disagree. The chief difference in principle is that the
> transmission can be done without human aid, and at near-instantaneous
> speeds, and is now genuinely world-wide.
>>  On a related note, see
>> the proposed standard for TCP/IP links via carrier pigeon:
>>    http://www.ietf.org/rfc/rfc1149.txt
>> That may be jocular, but it could work.
>> JFS>> With the advent of computer systems, programming tools
>>>>  forced those "footnotes" to be made explicit, and they
>>>>  have done so remarkably well since the 1950s.
>> PH> I guess I don't follow what you mean here.
>> The globally unique ids such as addresses and SSN's were
>> put on the computer.  By piling up enough of them, you
>> could get a unique key for anything in the world you wanted
>> to designate.
> No, you couldn't. But even if you were right, what has this got to do
> with footnotes?
>> That technique has been used in database
>> systems for over 40 years.
>> Admittedly, the methods were not uniform across all
>> implementations, they've been working for years.
>> PH> There were no global references before the Web.
>> No.  See above.
>> PH> Well, you can do things like this with redirects and so on.
>>>  People often don't bother in simple cases, but there is a
>>>  lot of PHP out there. For example, my URI
>> But that involves building an ad hoc redirecting kludge on top
>> of a global system instead of building an ad hoc global kludge
>> on top of a local system.
> It is not an ad hoc kludge. Redirection is part of the basic
> machinery of the internet, and has been since day one. Well, maybe
> day two. Read Roy Fielding's thesis.
>> JFS> I still remember the early days of PCs, in which names
>>>  of files and disk drives were hard coded in the programs
>>>  and people complained about those hard coded connections.
>>>  They pleaded for greater flexibility by parameterizing
>>>  the connections.  But now the SemWeb has returned to the
>>>  good (?) or bad (?) old days with hard coded connections.
>> PH> Not at all. In fact nothing could be further from the truth,
>>>  cf http://www.w3.org/Provider/Style/URI
>> That reference just recommends some naming guidelines.  What I
>> was talking about is a context-dependent reference system, which
>> could be dynamically linked into various contexts.
> So was the reference cited.
>>  That is the
>> typical programming-library notion that programmers have been
>> using since the early '60s.
>> For example, there are two fundamentally different ways of
>> linking a term, say "vehicle", into an environment:
>>  1. Globally unique: a fixed definition that is independent
>>     of any other definition for "vehicle" in any environment.
>>  2. Context dependent: the term "vehicle" is linked to whatever
>>     definition for "vehicle" is used in the current environment.
>> Method #1 is the default for the WWW, and method #2 must be
>> supported by some ad hoc kludge.
> Really, John, you should find out more about the actual Web
> architecture before saying such crazy things. URIs are globally
> unique. But that says nothing about global uniqueness of
> *definitions*, and a URI is not an address, it is an identifier. The
> process of getting from the URI to the 'place' where the identified
> resource is located - the http protocol specification - can be very
> complicated and intricate, and can easily accommodate your method #2.
> And this is not an ad hoc kludge, it is part of the most basic
> protocol definition of the entire Web, the process by which
> information is transferred across the entire internet; and in fact it
> is widely used in large professional websites.
>>  Method #2 is the default for
>> programming libraries, and method #1 is supported by a variety
>> of different methods (including the option of using a URI).
>> And I'd also like to mention a third approach:
>>  3. Unique, but mobile:  a globally unique id, which allows
>>     agents to move around the WWW while retaining their own
>>     identity, independent of the computer (or device or cell
>>     phone) on which it happens to be located
> Yes, URIs can do that, but not ...
>> -- and without
>>     any assistance (or even recognition) from the sender or
>>     the domain name servers of the Internet.
> ...this. Nor should they. Why would you want any Internet-based
> communication process to be done without assistance from the Internet
> servers? Seems crazy to me.
>> Method #3 can be simulated by implementing a virtual network
>> on top of the Internet, and I'm sure that such things will
>> proliferate over time.
> Well, there have been many such schemes. People at IHMC have one,
> which supports the CmapTools server/client system with a worldwide
> distribution. And like all such schemes, it is extremely hard to
> maintain, less efficient than simply using the Web, and doomed to
> ultimate failure. The fact is that all such schemes are inevitably
> overtaken by the exponents in Moore's Law within a few years. Its
> far, far cheaper to just use the actual Web, as well as being more
> robust.
>> All of these are important naming schemes, and they should all
>> be supported by some systematic scheme instead of being built
>> by an ad hoc kludge (such as a special-purpose PHP program).
> By your conventions, any programming counts as an ad-hoc kludge. PHP
> is widely used, and so are several other redirecting systems. They
> all work within the overarching architectural and semantic specs
> which define the Web. They *work*, to repeat. Why are they kludges??
> Pat
>> John

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx



Deborah MacPherson

The content of this email may contain private
confidential information. Do not forward, copy,
share, or otherwise distribute without explicit
written permission from all correspondents.



Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>