ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Topic maps and the "wheel" of "logical semantics": w

To: Pat Hayes <phayes@xxxxxxx>
Cc: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Patrick Durusau <patrick@xxxxxxxxxxx>
Date: Tue, 01 May 2007 14:14:47 -0400
Message-id: <46378397.8080206@xxxxxxxxxxx>
Pat,    (01)

We may actually be making progress! See comments below.    (02)

Pat Hayes wrote:    (03)

>> Pat,
>
>
> <snip>
>
>>>> such that all the information about that subject can be collocated 
>>>> to a single location.
>>>
>>>
>>>
>>> Aside from the above, this seems confused. In general, it is not 
>>> possible to collocate *all* the information about anything in one 
>>> location. Even encyclopedia articles are not *all* the information 
>>> about a thing. All the written biographies of Charles Darwin, say, 
>>> are not *all* the information about him. And even if this were 
>>> possible, why would one want to do this? What engineering or 
>>> pragmatic  purpose would it serve? Pointers were invented so that 
>>> one need not do this.
>>>
>> Well, I don't know if it is your "logic-fixation" or simply bad prose 
>> on my part but the "all" in that statement referred to all the 
>> information on a subject in a topic map. I wasn't claiming and I 
>> don't really think you thought I was, that a topic map contains *ALL* 
>> the information known about a subject.
>
>
> Well, that is a relief. :-)
>
>> Not real sure how one would make a judgment of that sort anyway.
>>
>> As far as the pragmatic purpose, consider the following example. I 
>> recently did a paper under contract that compared topic maps and 
>> record linkage, a technique that arose in vital records in the late 
>> 1950's. In doing that research, I discovered that the same technique 
>> has been studied under the following names:
>>
>> merge/purge, de-duplication, hardening of soft databases, reference 
>> matching, object identification, identity uncertainty, entity 
>> heterogeneity, entity identification, object isomerism, instance 
>> identification, entity reconciliation, list washing, data cleaning, 
>> deduplication and co-reference resolution
>>
>> I should note that more than one mathematical model has been 
>> developed for the technique, apparently due to failure to discover 
>> that the work had already been done under another name.
>>
>> Gee, you pose a tough question. Why would I possibly want to have 
>> information about that particular technique gathered together, 
>> without regard to the means used to identify it? Hmmm, well, perhaps 
>> so that I would not duplicate research done under another name for 
>> the same subject?
>
>
> It doesn't need to be in the same place for that, it only needs to be 
> accessible. Sitting here in my front room I can access information 
> stored in China or Vietnam or Spain.
>
Ah, bad prose on my part (again).    (04)

The oriignal topic maps standard (ISO 13250:1999) went to great lengths 
to point out that  the notion of a "single location" for all the 
information in a topic map means merely that however a subject is is 
identified, that a user is presented with all the information known 
about that subject in the topic map. The standard expresses no opinion 
on how that happens.    (05)

You are quite correct the information presented to you may  indeed be 
stored in any number of locations, the important point being that 
however the subject has been identified and I suppose I should say 
howerever that information is stored or its location, a topic map 
provides a mapping between the different identifications such that you 
get all the information about that subject (known to the topic map author).    (06)

>> So that I would not have to duplicate the matching that had been done 
>> by someone else between one or more of these terms? (Noting that I 
>> don't guarantee that a search on any of these terms will *always* 
>> return information that is about the same subject as the term "record 
>> linkage." Not only do people use different terms but sometimes a 
>> single term is used differently by different people.)
>>
>> If that sounds too "academic," imagine you are in a hospital 
>> emergency room and your physician misses critical information that 
>> would have been beneficial but was missed because they did not use 
>> the terminology necessary to find it.
>
>
> Something very close to this actually happened to my wife (she was 
> given an antibiotic to which she is allergic).
>
>>
>>
>>>> That's it. No attempt to specify a semantic theory or reasoning, etc.
>>>
>>>
>>>
>>> But that *is* the beginnings of a semantic theory. The first stage 
>>> of any semantics is the mapping between referring expressions and 
>>> their denotations, and the most basic semantic relationship is that 
>>> of identity. Peirce built his whole logic around identity mappings. 
>>> Really, Topic Maps are not anything fundamentally new; they are a 
>>> very old idea wrapped up in a new notation and terminology.
>>>
>> Actually topic maps or at least the behavior that topic maps 
>> represent, the mapping of different identifications of the same 
>> subject, is far older than Peirce. How do you think people do 
>> translations?
>>
>>> Logic does not *require* reasoning; but in any case, deriving the 
>>> conclusion that <this>=<that> is a very simple kind of reasoning.
>>>
>> Oh, part of your logic-fixation again. Yes, you can characterize a 
>> mapping of <this>=<that> as simple logic if that makes you feel any 
>> better.
>
>
> Its not a case of feeling better, but of whether or not this is a 
> logical inference; and it is. Quite a lot of OWL applications are used 
> almost exclusively to come to conclusions of this form, using 
> inverse-functional properties. Quite a lot is known about techniques 
> of (reasonably) efficient equality reasoning, which I imagine would be 
> directly applicable to your area of concern. I think its you who has 
> logic-phobia. :-)
>
Sorry, not guilty!    (07)

The Topic Maps Reference Model does *not* specify or require any 
particular method for determining that two (or more) identifications are 
of the same subject. (If we failed in that regard I would like to know 
so we can fix it.)    (08)

Let's take another specific example: The gene for the chemokine 
lymphotactin was was discovered by by three different groups, who 
promptly named it SCM1, ATAC and LTN. It is also known as LPTN and has 
an "official" name of XCL1.    (09)

There are at least two ways that we could devise a mapping that would 
result in a "merging of those different identifications.    (010)

1) A researcher reading the literature notices that ATAC identifies the 
same subject as XCL1 and enters that mapping in a topic map. (I am 
grossly over simplifying, the string "ATAC" is obviously inadequate for  
identifying this gene. A google search today  returnds some 
4,900,000,000 "hits" and not all of them are about this gene.    (011)

2) Assuming the existence of an ontology upon which to base inferencing 
(yes, I know gene ontologies exist) and assuming that all these terms 
havebeen mappped to that  (note singular) ontology, then you could use 
an inferencing engine to produce the same result.    (012)

As I said above, topic maps are not constrained to use only one or the 
other.    (013)

So, why would I ever want to use #1 and not #2? Well, let's look at : 
"The success (or not) of HUGO nomenclature," Genome Biology 2006, 7:402. 
(http://genomebiology.com/2006/7/5/402) That is the source of the 
foregoing example. The authors found that there is no clear tendency for 
authors to adopt official symbols. And that 14% of genes are never 
referenced using the official symbols.    (014)

That doesn't bode well for any system, such as an inferencing engine, 
that depends upon a particular nomenclature mapping for success. That is 
*not* a dig at logic or  inferencing  engines, just recognition that the 
initial condition for sucess, a common ontology to which genes are 
mapped, is highly unlikely.    (015)

So, it isn't really logic-phobia but facing what I think you would call 
an "objective fact," that is that people are going to identify the same 
subject differently and that includes the ontologies that they will 
choose (or not) to use. If you have the information necessary to support 
an inferencing engine, by all means, use one in your topic map.    (016)

Or to put it another way, if you can support an "intelligent agent," 
then do so. But don't overlook "intelligent users" while waiting for 
"intelligent agents" to arrive on the scene.    (017)

>>>> Anyone can use any basis for identifying a subject and any basis 
>>>> they like for saying that two or more identifications do indeed 
>>>> represent the same subject.
>>>
>>>
>>>
>>> Just as they can in a first-order logical semantic framework.
>>>
>> Oh, you mean like Peirce did in identifying a thief and the porter 
>> who stole his watch as the same person?
>
>
> I don't know that anecdote, sorry.
>
The account of it that I am most familiar with is found in: "The Sign of 
Three" by Eco and Sebeok. (pp. 11-16) Basically Peirce had his coat and 
watch stolen while on a steam ship and he had all the porters lined up 
so he could try to determine which one was the thief. In his telling of 
the story he admits that he had no basis on which to choose any of them 
but simply picked one for no discernable reason as the guilty party. It 
does turn out that he was correct.    (018)

>> But more to the point and what seems to be completely ignored in this 
>> discussion, is that most people don't use first-order semantic 
>> frameworks to make those mappings.
>
>
> Ah, this is the nub of the matter. I reject this claim. True, people 
> don't formalize their conclusions to themselves in a classical 
> textbook first-order notation; but are they in fact using first-order 
> valid inferences? I think in fact they often are. Bear in mind that 
> machine inference is not done by using a textbook-style display of a 
> linear proof sequence, A following from B by a logical inference rule: 
> it is done by heuristic techniques generating a semantic tableau, or 
> by a Davis-Putnam process, or the like. But it is still first-order 
> inference. And humans reason in ways that seem to be almost directly 
> first-order, including when deciding identities. Bill was wearing a 
> yellow jacket; very few people wear yellow jackets; the only person I 
> can see wearing a yellow jacket is that guy over there; that guy is 
> probably Bill. This is a first-order logical inference. Or: Joan 
> should be here by now; Joan hasnt phoned; if Joan had known she was 
> going to be late she would have phoned; so, something unexpected must 
> have delayed her. That is a first-order logical inference. And so on, 
> and on. I don't want to claim that *all * human reasoning is 
> first-order: but a surprisingly large amount of it seems to be.
>
Well, I was using your "first-order semantic frameworks" in the sense of 
formal application, without reference to their thinking processes being 
first-order. May well be a large amount of it is first-order as you say. 
But that elides over the issue that so far as I know, subject to your 
gentle correction, there is no generalized inferencing engine that comes 
close to those first-order processes when performed by a human agent.    (019)

>> Agreed, such mappings can be made in your first-order semantic 
>> framework. So what? Given the limited number of people who use them, 
>> how successful are they going to be in generating the number of 
>> mappings required?
>
>
> I think that any framework which is less expressive than FOL isn't 
> going to stand a chance of mirroring the inferences that people 
> routinely make, almost every minute they are awake, without 
> necessarily being consciously aware of it.
>
> <snip>
>
>>>> You are free, of course, to use ontologies or logic for such 
>>>> identifications or mappings, but such tools are not required.
>>>>
>>>> Judging from the current state of finding information on any given 
>>>> subject on the WWW, the wheel of "logical semantics" has been 
>>>> ignored, is broken or has other concerns.
>>>
>>>
>>>
>>> It has other concerns. Semantics isn't concerned with finding 
>>> information in a network. But I don't seem to find Topic maps of any 
>>> use in this task either. The way to find information is to use a 
>>> very large hash table, such as Google.
>>>
>> Just because you are not concerned with a problem, such as finding 
>> information (whether networked or not, topic maps are not limited to 
>> networked information), doesn't mean that it isn't real or doesn't 
>> merit a solution.
>
>
> Oh, I entirely agree.
>
>> That you would suggest very large hash tables shows you haven't 
>> devoted much thought to the problem. Hash tables, large or otherwise, 
>> are not in and of themselves an answer to the problem of different 
>> identifications for the same subject or the same identiification 
>> being used by different people for different subjects.
>
>
> Of course not. I agree this is an important issue; but topic maps 
> simply record this, as far as I can see. To record an equation is 
> easy: it can be done in almost any formalism.
>
Sorry, topic maps simply record .... what?     (020)

>> Moreover, with a topic map I can record my mapping between different 
>> identifications for the same subject, which would be a benefit to the 
>> next person who searches for that subject under any of its 
>> identifications.
>
>
> If I follow what you mean here, we also invented a notation for this 
> in IKL. Its basically the use of typed literals, where the 'datatype' 
> is the identification mapping. But I agree, having a very general 
> notation for this is useful.
>
> So let me see if I understand this. as you explain it. A topic map is 
> basically a complex name for a thing, one which records a variety of 
> 'superficial' names, each used in a different context of 
> identification to refer to the same thing. The TM records both the 
> fact of these names being coreferents and records, and links to, the 
> various contexts of identification where the superficial names are 
> used.  So one might express it as a collection of <namestring, 
> identification-context> pairs. Is that right?
>
Yes, modulo that a topic map is a collection of such "complex names" for 
things, where the "complex name" (in the reference model we call them 
proxies) can also include any other properties that are associated with 
a thing.    (021)

>> Topic maps are in use by diverse such organizations such as the 
>> Norwegian Post Office, the IRS, the Office of Naval Intelligence and 
>> the Y-12 Complex at Oak Ridge, as well as the Danish Royal Library. 
>> To name only a few.
>>
>> Google isn't a solution. It is a symptom of the problem faced by 
>> anyone who wants to find information about a subject however it is 
>> identitified.*
>
>
> It is a vitally important tool, however. And it solves a whole lot of 
> problems: the Web wouldn't be nearly so useful without it.
>
Oh, sure, I use it everyday. Multiple times each day. Doesn't keep me 
from wanting for something better.    (022)

>> *(The SW proposal that every subject have a single unique identifier
>
>
> Whoa, that isn't the SW proposal, and its not the TAG Web architecture 
> position either. Of course there isn't a single unique identifier, in 
> general: if there were, owl:sameAs would be vacuous. I agree, the idea 
> of single unique 'true name' is ridiculous. I call it the EarthSea 
> theory of reference, after the idea in the Ursula LeGuin novels.
>
Ok, so we agree that a "true name" is ridiculous. Great!    (023)

I really think our difference is one of emphasis if that. You want to 
say that users to create mappings between different identifications of 
the same subject are using "first-order" processes. If you want to 
describe their activity that way, I have no objection as it is your 
description. Where I would object is telling users that they have to use 
explicit "first-order" processes to make those mappings. (Noting that I 
certainly agree you can use an inferencing engine to make to same 
mappings, well, asssuming you can find one as robust as a human user. 
Your mileage may vary.)    (024)

Hope you are having a great day!    (025)

Patrick    (026)

-- 
Patrick Durusau
Patrick@xxxxxxxxxxx
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model
Member, Text Encoding Initiative Board of Directors, 2003-2005    (027)

Topic Maps: Human, not artificial, intelligence at work!     (028)



_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (029)

<Prev in Thread] Current Thread [Next in Thread>