FYI: (01)
---------- Forwarded message ----------
From: Rubin, Ken <ken.rubin@xxxxxx>
Date: Mon, Nov 16, 2009 at 10:26 AM
Subject: RE: 1st draft of an RFP requesting API for Knowledge Bases
To: "edbark@xxxxxxxx" <edbark@xxxxxxxx>, "hugues.vincent"
<hugues.vincent@xxxxxxxxxxxxxxx>
Cc: "ontology@xxxxxxx" <ontology@xxxxxxx>, Alan Honey
<aphoneysys@xxxxxxxxx>, "jobst.landgrebe@xxxxxxxxx"
<jobst.landgrebe@xxxxxxxxx>, "healthcare@xxxxxxx"
<healthcare@xxxxxxx>, "Solbrig, Harold R." <Solbrig.Harold@xxxxxxxx> (02)
Broadening the pool a bit to include the healthcare list. (03)
Ed: I'll start with the end so we don't argue the semantics in the
middle. I agree with your assertions around alinging the work in
advance and avoiding silo-building. Frankly, the remainder are
probably smaller nuance items. (04)
The one point that I think merits a mention is your assertion around
the role of an information model. In a "classic IT sense" the Info
model drives toward design of persistence, but at least within the
Health domain that core use has morphed over the past several years.
I see an information model as a shared represntation of concept
understanding. If anything, the "lines" between an information model
and underlying concept/terminology models is ever blurring, driven
more by the ability to achieve a cross-party consensus on consistency
(represented in the "info model") versus diversity (in the
terminology). (05)
Your penultimate assertion, however, is spot on. If we come through
this opportunistic time without the ability for these different
standards to align if not directly complement each other we've missed
the boat. (06)
- Ken (07)
-----Original Message-----
From: Ed Barkmeyer [mailto:edbark@xxxxxxxx]
Sent: Friday, November 13, 2009 1:49 PM
To: hugues.vincent
Cc: ontology@xxxxxxx; Alan Honey; Rubin, Ken; jobst.landgrebe@xxxxxxxxx
Subject: Re: 1st draft of an RFP requesting API for Knowledge Bases (08)
All, (09)
I can't resist sticking my oar in on this. (010)
First observation: Terminology models, and "dictionary services" are
becoming a cottage industry. There is a need for a set of cohesive
standards, but not a need for four diverse standards activities, e.g.,
in ISO TC37, in JTC1/SC32 (ISO 11179), in TC184/SC4 (ISO 29002), and
OMG CTS (and SBVR and KDM). Some, perhaps all, of these activities
are somewhat misguided, because each is projecting its own viewpoint
on the world. (011)
Second observation: Hugues quotes from the CTS RFP: (012)
> Terminologies (also designated as controlled vocabularies) are concept
> centric, i.e. they provide a set of concepts, representations of these
> concepts (designations and codes), definitions of their meanings and
> binary relationships of the concepts to each-other. They are not
> primarily used to represent knowledge, but to provide concept
> representations, the basic elements for computational semantics.
> Ontologies provide knowledge representation systems used to represent
> knowledge in a machine storable and interpretable manner which allows
>machine-based syntactic deduction ("reasoning").
> (013)
Ontologies are concept-centric, i.e. they provide a set of concepts,
designations for the concepts, and formal or informal definitions of
their meanings in terms of other concepts. In ontologies written in
Description Logic languages, concepts are commonly described as being
of two kinds: "classes", which characterize individuals, and
"properties", which characterize relationships between individuals.
In First-Order Logic languages, concepts are characterized by
"relations" (which are just the generalization of "class" and
"property"), and "axioms" which are assertions about individuals in
terms of the relations they satisfy. (014)
Note the parallels, in particular the parallel to Description Logic.
While ontologies (in the narrow sense used in OMG) are nominally about
reasoning, the ability to reason with them is based on the language in
which they are written, and not on the nature of the knowledge that is
captured. That is, a terminology whose terms and definitions are
captured in a formal language is a (formal) ontology. A terminology
whose definitions are captured only in unstructured natural language
is a "weak ontology". Many published OWL models and most UML
"information models", along with IMM E-R models and ORM models, are
weak ontologies:
the only relationships they capture that are suitable for reasoning
are subsumption (subtype) and cardinality constraints (association end
multiplicities). Per ISO 704, terminological definitions capture
subsumption; they may or may not capture some cardinality constraints. (015)
So there is a kind of continuum here: terminological dictionary,
taxonomy, information model, DL ontology, FOL ontology. Terminology
services can operate effectively on all of them. DL services operate
on the ontology to deduce unstated relationships among classes and
properties (not usually individuals). Data services operate on any of
(taxonomy, information model, DL ontology) and an associated
data/knowledge base of facts about individuals to provide asserted
facts and mechanically derived facts about individuals. Reasoning
services operate on an ontology and an associated knowledge base of
assertions about individuals to infer facts about individuals. (016)
So, if we are going to draw lines between these things, we all have to
have an agreed-upon terminology and an agreed-upon model for kinds of
information and kinds of information schemata. The text of CTS2
excerpted above makes the erroneous assumption that formal information
modeling languages are not primarily about concepts. Information
modeling, which includes all ontologies, is_ only_ about concepts and
designations and relationships among concepts. The distinction is in
how those are captured and to what end. (017)
Third observation: The services characterize the purpose of the
information model. The purpose of a terminology is to support human
comprehension of designations and to support the translation of
designations from one language (natural or formal) to another. The
traditional purpose of an information model is typically to design a
data repository or a message suite. The nominal purpose of a formal
ontology is to support some kind of inferencing, either about terms,
or about individuals. But it should be noted that formal information
models, including ontologies, are used to translate data element
languages, design or interpret data repositories, to design or
interpret message suites, and to characterize automated "services" in
an SOA environment, very little of which involves any "reasoning". (018)
So "Terminology services" should apply to arbitrary information
models, from dictionaries up to formal ontologies. Service support
for each of the other categories would presumably include services
that don't apply to less structured categories of information model. (019)
And it is exactly that distinction that makes having an architecture
for these things meaningful. The nature of an "ontology" -- an
information model -- is what it enables as built. An RDF triple store
is just a poor relational database unless the services can do
something more interesting than joins. (020)
Finally, therefore, I think there is a common subset of "ontology
repository services" and CTS services. But like ISO 29002, CTS will
be concerned about the relationships between designations for the same
concept in different languages, both formal (codes) and natural
(jargon), whereas ontology services will not typically call out that
set of concerns as having any great significance. In ontologies,
sameAs is just another concept-to-concept relationship, and codes are
datatype properties of things or classes. (021)
Hugues says:
> Next, we could indeed highlight the discrepancies between the two RFPs but
>mainly CTS2 deals with terminologies and API4KB with knowledge representation
>and reasoning on it. Indeed, reasoning is one of the main points, if not *the*
>main point, in API4KB.
> (022)
I agree that it may be *the* main point as written, but I don't think it
*is* the main point of many existing corpora that are written in OWL
and RDF, and there are many service concepts related to other purposes
of formal information models that go beyond what one might expect a
"terminology" to support. The question is more one of the scope of
CTS2 than the "focus" of API4KB. If the scope of CTS2 is all of ISO
11179-3 ed3, then the distinctions are going to be almost entirely
about reasoning. (023)
> So, I tend to consider that a CTS2 and API4KB are both of interest. (024)
I agree completely. (025)
> They may be overlapping in the sense that a CTS2 implementation may be used
>by a API4KB implementation to access to terminologies ("basic elements for
>semantics"), even if that would be a clumsy implementation!
> (026)
This is the wrong way to think about it. Per the recently proposed
"future standards architecture for ISO TC184/SC4", the
"meta-relationship" between an ontology declaration and a
"terminological entry" in a "pure terminology" is just a URI, or some
similar form of bibliographic citation. It means: this symbol in the
formal language represents (or formalizes) the same concept as this
term in the dictionary. In most cases, the orthography in the
ontology will be similar, the ontology will provide an "annotation"
that replicates or paraphrases the natural language definition, and it
may provide a formal definition that is equivalent. That is why an
ontology repository is perfectly capable of supporting some
terminology services directly. (027)
But underlying the verbiage, a primary Healthcare intent for CTS is
about "data element dictionaries" -- the relationship between _codes_
and jargon terms and natural language definitions. That is the
substance of the model in ISO 11179-3:2004 (which is what the NCI
people use for this). "Codes" are designations for the concept in a
different kind of formal language, and that is not an ontological
concept, although it is often an information modeling concept. (028)
So for OMG, the requirement should be that the same service has the
same service specification in both standards. The trick is to sort
out the truly terminology services that are common to ontologies,
dictionaries and data element dictionaries. And all of that should be
coordinated with the (still draft) information model of that stuff in
JTC1/SC32.
Alternatively, we can ask ourselves how many more silos we want to produce. (029)
-Ed (030)
--
Edward J. Barkmeyer Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263 Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263 FAX: +1 301-975-4694 (031)
"The opinions expressed above do not reflect consensus of NIST,
and have not been reviewed by any Government authority."
_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/health-ont/
Community Files: http://ontolog.cim3.net/file/work/health-ont/NHIN-RFI/
To Post: mailto:health-ont@xxxxxxxxxxxxxxxx
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?NhinRfi (032)
|