Forward from Doug based on questions asked at the OOR panel
session today.
 
Thanks, Doug!
 
 
Here were the questions, responses:
 
[14:30] LeoObrst: Question to Doug: Given Cyc's long
experience with such matters, can you provide the OOR group with what you would
suggest as a "small set of inter-ontology alignment relations"? Which
are necessary and which are desirable?
 
[14:45] doug lenat: Response to Leo: Yes, I would be
happy to provide the set of (surprisingly few) predicates we use to state those
inter-ontology correspondences.  I will send that out and/or post it today
or tomorrow.
 
See the chat session for other interesting discussion.
 
Leo
 
-----Original Message-----
From: Doug Lenat [mailto:doug@xxxxxxx] 
Sent: Thursday, April 03, 2008 5:46 PM
To: Obrst, Leo J.; Pat Hayes; Peter P. Yim
Subject: promised material on expressing the mappings Cyc to other ontologies
 
This is the material I promised during the conference
call; about 
mapping Cyc to other ontologies.  Please circulate
or post this, as 
appropriate.
 
There are two different sorts of "mapping to other
ontologies/schemata" 
that we do from Cyc:
* mapping between terms in Cyc and terms in other
ontologies
* mapping between terms in Cyc and terms in databases (or
SUBMIT-able 
web pages).  We call this SKSI for Semantic
Knowledge Source Integration.
 
 
The next two sections treat these two asterisked
processes in turn. 
 
To be clear, this is not HOW we figure out the mappings,
this is the 
vocabulary of CycL predicates, collections, etc. we
employ to express 
that mapping, express it formally, express it in ways
that the standard 
Cyc inference engine can make use of information stored
in those alien 
sources, e.g., as part of discharging some
sub-sub-...-problem.
 
There is a third sort of mapping which is NOT covered
here, namely where 
we are exporting entire sets of Cyc assertions to
languages (e.g., OWL) 
which are strictly less expressive than Cyc's
representation language, 
and need to have special ways of flattening HOL
assertions into FOPC, or 
full FOPC assertions into description logic, etc.
 
--------------------------------------------------------
 
Here's a summary of the predicates etc. we use to map to
other ontologies:
---------------------------------------------------------------------------
 
exact mapping:
 
 (synonymousExternalConcept CYC-TERM SOURCE
SOURCE-TERM)
 
CYC-TERM maps to SOURCE-TERM in SOURCE (ontology/representation
system 
where SOURCE-TERM 'lives')
 
E.g.,
 
 (synonymousExternalConcept DrinkingMug 
 LSCOMObjectAndSituationOntology "Mug")
 
;;;;;;;;;;;;;;;
 
close mapping:
 
  (overlappingExternalConcept CYC-TERM SOURCE
SOURCE-TERM)
                                                                                                                                                                                             
CYC-TERM closely maps to SOURCE-TERM in SOURCE.
                
                                                                                                                                                                             
This relation might hold under a number of
conditions:  
 
* SOURCE-TERM conflates some concepts (so that there are 
  two or more Cyc-term entries with which it
overlaps), 
 
* SOURCE-TERM is a predicate with the same meaning as
CYC-TERM, but
  has a different arg-order.
 
* SOURCE-TERM denotes a predicate with different (usually
tighter) 
  argument constraints than CYC-TERM (due to certain
contextual assumptions 
  built into SOURCE ontology).
 
So overlappingExternalConcept assertions are sometimes
inferred from 
more specific mapping predicates, e.g.,
 
;;;;
 
Different Arg Constraints:
 
 (synonymousExternalPredWRTTypes CYC-PRED SOURCE
S-PRED TYPE1 TYPE2)
 
* S-PRED denotes a specialization of CYC-PRED, where the 
specialization is defined by the tighening of the 1st and
2nd 
arguments as specified by TYPE1 and TYPE2.
 
E.g.,
 
 (synonymousExternalPredWRTTypes dateOfDeath
JRC-EMMOntology 
  "dateDeath" Person Date)
 
The JRC-EMM ontology only deals with deaths of people
when they deal 
with deaths at all, so their "dateDeath"
predicate maps to our 
#$dateOfDeath, but is more restrictive (our predicate
relates living 
organisms to their death-dates).
 
;;;;
 
Different arg-order:
 
 (synonymousExternalPred-Inverse PRED SOURCE STRING)
 
E.g.,
 
 (synonymousExternalPred-Inverse primeMinister
JRC-EMMOntology 
  "isPrimeMinisterFor")
 
(self-explanatory)
 
;;;;
 
Combination of arg-order and constraint-variance:
 
 (synonymousExternalPredWRTTypes RELN SOURCE STRING
TYPE1 TYPE2)
 
E.g.,
 
 (synonymousExternalPredWRTTypes-Inverse children
JRC-EMMOntology 
  "childOf" Person Person)
 
The Cyc predicate #$children relates two animals, in
order of parent 
to child.  JRC-EMM ontology relates two people, in
order of child to 
parent.
 
 
------------------------------------------------
 
Here's a summary of the SKSI mapping vocabulary we
employ:
----------------------------------------------------------------
 
The term #$StructuredKnowledgeSource in the CycL language
denotes the 
collection of all knowledge sources which have a well
defined schema 
that can be used to access and interpret the source's
data.  When a new 
database instance is integrated with Cyc, it is denoted
by a unique term 
in CycL that is an instance of the collection
#$Database-Physical, a 
specialization of #$StructuredKnowledgeSource.
 
A knowledge source is related to its parts using the CycL
predicate 
#$subKS-Direct. We create a instance of
#$DatabaseTable-Physical in CycL 
for each table contained within a database and associate
it with the 
CycL term for the parent database, relate it to the
parent using 
#$subKS-Direct and assert the name of the table.
 
The literal structure of part or all of a structured
knowledge source is 
described by its physical schema, denoted in Cyc by the
collection 
#$PhysicalSchema. A physical field is an abstraction of a
column in a 
database table.  The CycL term #$PhysicalField
denotes the collection of 
all physical fields. A physical schema has one physical
field for each 
column of a table and bears the same name as the column
it represents. A 
physical field is determined uniquely by its associated
physical schema 
and name, so we introduce a binary function
#$PhysicalFieldFn which 
takes an instance of #$PhysicalSchema as its first
argument and a string 
as its second argument. Functional expressions denote
unique instances 
of #$PhysicalField. A physical schema is related to its
fields using the 
CycL predicate #$physicalFields.
 
Here are some examples drawn from a mapping of Cyc to the
USGS GNIS 
database:
 
(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn
#$USGS-GNIS-PS "fid"))
(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn
#$USGS-GNIS-PS "name"))
(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn
#$USGS-GNIS-PS "type"))
(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn
#$USGS-GNIS-PS 
"state_fips"))
(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn
#$USGS-GNIS-PS 
"county_fips"))
 
Field data types are represented by associating the field
with a CycL 
term denoting its datatype:
 
(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS
"fid") #$Integer)
(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS
"name") 
#$CharacterString)
(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS
"type") 
#$CharacterString)
(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS
"state_fips") 
(#$StringOfLengthFn 2))
(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS
"county_fips") 
(#$StringOfLengthFn 3))
 
The logical schema of a knowledge source (database or
database table) is 
the semantic analogue of its physical schema. It
describes how the 
content of a table is interpreted in the broader Cyc
ontology. The 
collection of all logical schemas is denoted in CycL by
the term 
#$LogicalSchema. Typically, one logical schema is
associated with one 
physical schema for each database table.
 
In CycL, a logical field type is a collection in the
ontology and each 
instance of #$LogicalField is related to some instance of
#$Collection, 
which determines its type. Similar to #$PhysicalFields,
we construct 
instances of #$LogicalField using a function,
#$LogicalFieldFn, that 
takes the logical schema as its first argument and use
the logical 
field's type as its second argument. However since a
table may have 
multiple columns that correspond to the same type of
object, just 
stating the type alone is not sufficient to uniquely
identify a logical 
field within a logical schema. So in addition we use a
unique integer in 
the third argument of the function. The choice of the
integer is 
arbitrary, as long as it results in a term that is
distinct from the 
other logical fields for a schema. In addition, we relate
a logical 
schema to its fields using the CycL predicate
#$logicalFields.
 
(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn #$USGS-GNIS-LS
#$Place 
1))
(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn
#$USGS-GNIS-LS 
#$ProperNameString 1))
(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn
#$USGS-GNIS-LS 
#$CartographicFeatureType 1))
(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn
#$USGS-GNIS-LS 
#$State-UnitedStates 1))
(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn
#$USGS-GNIS-LS 
#$USCounty 1))
 
In object oriented databases and many relational
databases, tables often 
correspond to natural classes of objects in the world, and
the rows of 
such tables may implicitly denote the objects of the
class. In such 
cases, the table's primary key provides an identifier
that may be used 
to uniquely distinguish between objects of the class, in
addition to 
merely distinguishing between rows of a table. We
distinguish such 
tables by assigning to them a special type of logical
schema, called an 
object defining schema, which is denoted in CycL by the
collection 
#$ObjectDefiningSchema, a specialization or subcollection
of the 
collection #$LogicalSchema.
 
For example, the gnis.type column contains coded values
that describe 
the type of cartographic feature represented by a feature
in the 
database. The physical field corresponding to this column
is
 
(#$PhysicalFieldFn #$USGS-GNIS-PS "type")
 
and the logical field corresponding to this column is
 
(#$LogicalFieldFn #$USGS-GNIS-LS #$CartgraphicFeatureType
1)
 
The correspondence between the values for the physical
field and the 
values for the logical field (instances of
#$CartographicFeatureType) 
are recorded in the reified mapping
#$USGS-FeatureType-CMLS using the 
#$codeMapping predicate. Here is a sample of these
sentences:
 
(#$codeMapping #$Usgs-FeatureType-CMLS "cave"
#$Cave)
(#$codeMapping #$Usgs-FeatureType-CMLS "other"
#$Place)
(#$codeMapping #$Usgs-FeatureType-CMLS "ruin"
#$RuinedArtifact)
(#$codeMapping #$Usgs-FeatureType-CMLS
"unknown" #$Place)
(#$codeMapping #$Usgs-FeatureType-CMLS "summit"
#$Summit)
(#$codeMapping #$Usgs-FeatureType-CMLS "slope"
#$Slope-Topographical)
(#$codeMapping #$Usgs-FeatureType-CMLS "ridge"
#$Ridge-Hill)
(#$codeMapping #$Usgs-FeatureType-CMLS "ppl"
#$PopulatedPlace)
 
Finally, the following sentence tells Cyc that the 
#$USGS-FeatureType-CMLS reified mapping should be used to
translate the 
logical field above:
 
(#$logicalFieldMapping (#$LogicalFieldFn #$USGS-GNIS-LS 
#$CartgraphicFeatureType 1) #$Usgs-FeatureType-CMLS)
 
Whereas the SQL standard and database management systems
conflate the 
two at the conceptual level, we distinguish between a
physical field and 
an arbitrary value of the physical field, and between a
logical field 
and an abritrary value of the logical field. The values
of a physical 
field are called physical field indexicals and are
denoted in CycL by 
the collection #$PhysicalFieldIndexical. Similarly, the
values of a 
logical field are called logical field indexicals and are
denoted in 
CycL by the collection #$LogicalFieldIndexical. Exactly
one physical 
field indexical is created for each physical field, and
one logical 
field indexical for each logical field. To denote the
instances of 
#$PhysicalFieldIndexical and #$LogicalFieldIndexical we
introduce two 
additional functions, #$ThePhysicalFieldValueFn and 
#$TheLogicalFieldValueFn. They are the indexical
analogues of 
#$PhysicalFieldFn and #$LogicalFieldFn and have exactly
the same 
argument signature. Physical and logical indexicals are
related to their 
schema using the CycL predicates
#$physicalFieldIndexicals and 
#$logicalFieldIndexicals respectively, and to their
corresponding 
physical and logical fields using the CycL predicates 
#$indexicalForPhysicalField<tt> and
<tt>#$indexicalForLogicalField 
respectively. The sentences and terms for the USGS GNIS
example fields are:
 
(#$physicalFieldIndexicals #$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn 
#$USGS-GNIS-PS "fid"))
(#$physicalFieldIndexicals #$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn 
#$USGS-GNIS-PS "name"))
(#$physicalFieldIndexicals #$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn 
#$USGS-GNIS-PS "type"))
(#$physicalFieldIndexicals #$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn 
#$USGS-GNIS-PS "state_fips"))
(#$physicalFieldIndexicals #$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn 
#$USGS-GNIS-PS "county_fips"))
 
(#$indexicalForPhysicalField (#$PhysicalFieldFn
#$USGS-GNIS-PS "fid") 
(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS
"fid"))
(#$indexicalForPhysicalField (#$PhysicalFieldFn
#$USGS-GNIS-PS "name") 
(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS
"name"))
(#$indexicalForPhysicalField (#$PhysicalFieldFn
#$USGS-GNIS-PS "type") 
(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS
"type"))
(#$indexicalForPhysicalField (#$PhysicalFieldFn
#$USGS-GNIS-PS 
"state_fips") (#$ThePhysicalFieldValueFn
#$USGS-GNIS-PS "state_fips"))
(#$indexicalForPhysicalField (#$PhysicalFieldFn
#$USGS-GNIS-PS 
"county_fips") (#$ThePhysicalFieldValueFn
#$USGS-GNIS-PS "county_fips"))
 
(#$logicalFieldIndexicals #$USGS-GNIS-LS
(#$TheLogicalFieldValueFn 
#$USGS-GNIS-LS #$Place 1))
(#$logicalFieldIndexicals #$USGS-GNIS-LS
(#$TheLogicalFieldValueFn 
#$USGS-GNIS-LS #$ProperNameString 1))
(#$logicalFieldIndexicals #$USGS-GNIS-LS
(#$TheLogicalFieldValueFn 
#$USGS-GNIS-LS #$CartographicFeatureType 1))
(#$logicalFieldIndexicals #$USGS-GNIS-LS
(#$TheLogicalFieldValueFn 
#$USGS-GNIS-LS #$State-UnitedStates 1))
(#$logicalFieldIndexicals #$USGS-GNIS-LS
(#$TheLogicalFieldValueFn 
#$USGS-GNIS-LS #$USCounty 1))
 
(#$indexicalForLogicalField (#$LogicalFieldFn
#$USGS-GNIS-LS #$Place 1) 
(#$TheLogicalFieldValueFn #$USGS-GNIS-LS #$Place 1))
(#$indexicalForLogicalField (#$LogicalFieldFn
#$USGS-GNIS-LS 
#$ProperNameString 1) (#$TheLogicalFieldValueFn
#$USGS-GNIS-LS 
#$ProperNameString 1))
(#$indexicalForLogicalField (#$LogicalFieldFn
#$USGS-GNIS-LS 
#$CartographicFeatureType 1) (#$TheLogicalFieldValueFn
#$USGS-GNIS-LS 
#$CartographicFeatureType 1))
(#$indexicalForLogicalField (#$LogicalFieldFn
#$USGS-GNIS-LS 
#$State-UnitedStates 1) (#$TheLogicalFieldValueFn
#$USGS-GNIS-LS 
#$State-UnitedStates 1))
(#$indexicalForLogicalField (#$LogicalFieldFn
#$USGS-GNIS-LS #$USCounty 
1) (#$TheLogicalFieldValueFn #$USGS-GNIS-LS #$USCounty
1))
 
The schema translation process begins by relating a
physical schema to 
its associated logical schema. This is done using the
CycL predicate 
#$logicalPhyscialSchemaMap. For the USGS GNIS table, we
have
 
(#$logicalPhysicalSchemaMap #$USGS-GNIS-LS
#$USGS-GNIS-PS)
 
Physical and logical field translations are stated as
relationships 
between their corresponding indexical terms. These
relationships 
describe explicitly how to manipulate the raw data object
from its 
representation as a physical field value to its
representation as a 
logical field value, and vice versa. This is acomplished
using two CycL 
predicates, #$fieldDecoding and #$fieldEncoding. Below
are the schema 
translation templates for the gnis.fid physical and
logical fields that 
describe how the data values are translated, using the
physical and 
logical field indexicals as placeholders:
 
(#$fieldDecoding
 #$USGS-GNIS-LS
 (#$TheLogicalFieldValueFn #$USGS-GNIS-LS #$Place 1)
 #$USGS-GNIS-PS
 (#$SourceSchemaObjectFn
   #$USGS-KS
   #$USGS-GNIS-LS
   (#$ThePhysicalFieldValueFn #$USGS-GNIS-PS
"fid")))
 
(#$fieldEncoding
 #$USGS-GNIS-PS
 (#$ThePhysicalFieldValueFn #$USGS-GNIS-PS
"fid")
 #$USGS-GNIS-LS
 (#$SourceSchemaObjectIDFn
   #$USGS-KS
   #$USGS-GNIS-LS
   (#$TheLogicalFieldValueFn #$USGS-GNIS-LS
#$Place 1)))
 
The #$fieldDecoding says, in essence, that to convert a
value for the 
gnis.fid into an instance of #$Place, plug it into the
third argument of 
the #$SourceSchemaObjectFn term, with the given terms
#$USGS-KS and 
#$USGS-GNIS-LS in the second and third arguments
respectively, while the 
fieldEncoding says that to convert an instance of #$Place
to a raw data 
value that is consistent with the constraints on
gnis.fid,plug it into 
the third argument of the #$SourceSchemaObjectIDFn term
with the given 
terms #$USGS-KS and #$USGS-GNIS-LS in the second and
third arguments 
respectively.
 
-- 
----------------------------------
Douglas Lenat
CEO, Cycorp
7718 Wood Hollow Drive, Suite 250
Austin, TX 78731
 
phone: 512-342-4001
cell: 512-773-1709
email: Doug@xxxxxxx
----------------------------------