[Top] [All Lists]

RE: [ontolog] UBL proposals for codesets?

To: "'ontolog@xxxxxxxxxxxxxxx'" <ontolog@xxxxxxxxxxxxxxx>
From: "Miller, Robert (GXS)" <Robert.Miller@xxxxxxx>
Date: Mon, 28 Oct 2002 11:01:11 -0500
Message-id: <3D808EC801AED111B0100008C75D5DDC15B725BF@xxxxxxxxxxxxxxxxxxxxxxx>


As I've previously reported directly to the UBL team, I find fault with the definition given for code in the attached paper:

        "A code is a representation of some thing, normally text, abbreviating it to a shortened, encoded form, and usually of the same consistent lenght within a code list.  In essence it is an abbreviation."

The implication I have read from this definition is that the 'text' the code represents is the end of the line, from a semantic viewpoint.  But the reality is that the code is a pointer to a collection of semantic information, at least some of which is likely to be of semantic significance to the application processing the information in which the code is imbedded.  I've studied the existing X12 code lists at some length, and have yet to find a code list that does not identify semantic properties beyond that of the 'text' used to describe the code. 

I have also observed instances where a given semantic identifier provided by a code list value in a specific code list is also provided by another code value in another code list, and is also provided in the definition of an entity.  That is, I've found three ways in X12 syntax to represent the same semantic properties.  But X12 does not provide a unique identifier for such properties such that the three ways to represent the properties are related one to another. 

I observe that the work of UBL has likewise failed to recognize that codes stand in as identifiers of semantic properties, and so has likewise failed to provide a means to associate coded entries with unique semantic identifiers.

An example of the X12 deficiency in uniquely identifying semantic properties, I point you to the DMG Demographis Information segment, field 02 DE 1251 Date Time Period, and its associated semantic note:

"DMG02 is the date of birth.  Presumably, this is the same semantic property of date identified by DE 374 Date/Time Qualifier code value '222' described as 'Birth'.  One may also find date property qualifiers defined by semantic notes to elements, but not found in the DE374 code list.

If interoperability among varying syntactic representations of business information is to be realized, we must provide a means to at least express equivalence of semantic representations.  It would be nice if we could also express near equivalences, such as the Great Britain / United Kingdom code list entries   expressed in the comment below.  Current practice has failed to recognize the problem, and so cannot begin to address the problem.


-----Original Message-----
From: Tim McGrath [mailto:tmcgrath@xxxxxxxxxxxxxxx]
Sent: Sunday, October 27, 2002 7:45 PM
To: ontolog@xxxxxxxxxxxxxxx
Subject: Re: [ontolog] UBL proposals for codesets?

As far as distinguishing codes from identifiers, for the present we have
adopted the position as outlined in the attached paper.  This conforms
to your definition of a code.

In terms of how we intend codesets to be implemented we have a technical
solution as given in the paper...

We are also establishing preferred codesets for many of the codes
defined in the vocabulary.  For example, ISO 639 is the recommended code
for languages.

I personally see the choice of codeset as secondary to the semantics of
the object itself.  That is, we need to understand what a language is
and when to use it before we determine the appropriate codesets .  This
is more problematic when we qualify an object with a 'type' that is
coded - what do we mean by 'type'?  For example, in UBL we have a Type
entity within an Order document, is this the type of document (e.g.
Order, Invoice, Response) or is it a type of Order (Standing, Reverse,
One-off, etc..).  It is this ambiguity that creates more problems than
the choice of codeset. If someone uses 'GB' as opposed to 'UK' as their
country code - at least we know they are talking about the same thing
(roughly).  In these cases it is often a simple transformation - a
process most business do anyway for their internal to external code

Leo Obrst wrote:

>We had a discussion at the UBL workshop back in June about codesets (and
>also identifiers) and how UBL should or would handle these. Has there
>been additional discussion on this, or any guidelines established, etc.?
>If so, can you point me to a document?
>By codes and codesets I mean: a code is a shorthand for some concept,
>e.g., a two- or three-character representation for a specific country.
>Another example: the two-character US state code representing (and
>abbreviating) the state, e.g., ME for Maine. In general, a code is an
>abbreviation, a more compact representation for a concept (to minimize
>storage as opposed to maximizing human readability/interpretation).
>One of the issues in ontologies and business of course is that often
>these codes (and different, possibly conflicting codesets) are used
>willy-nilly as the only representation for the concept or in the
>Dr. Leo Obrst  The MITRE Corporation
>mailto:lobrst@xxxxxxxxx Intelligent Information Management/Exploitation
>Voice: 703-883-6770 7515 Colshire Drive, M/S W640
>Fax: 703-883-1379       McLean, VA 22102-7508, USA
>To post messages mailto:ontolog@xxxxxxxxxxxxxxx
>An archive of the [ontolog] forum can be found
>at http://ontolog.cim3.org/forums/ontolog


tim mcgrath
fremantle  western australia 6160
phone: +618 93352228  fax: +618 93352142

<Prev in Thread] Current Thread [Next in Thread>