Forward from Doug based on questions asked at the OOR panel
session today.
Thanks, Doug!
Here were the questions, responses:
[14:30] LeoObrst: Question to Doug: Given Cyc's long
experience with such matters, can you provide the OOR group with what you would
suggest as a "small set of inter-ontology alignment relations"? Which
are necessary and which are desirable?
[14:45] doug lenat: Response to Leo: Yes, I would be
happy to provide the set of (surprisingly few) predicates we use to state those
inter-ontology correspondences. I will send that out and/or post it today
or tomorrow.
See the chat session for other interesting discussion.
Leo
-----Original Message-----
From: Doug Lenat [mailto:doug@xxxxxxx]
Sent: Thursday, April 03, 2008 5:46 PM
To: Obrst, Leo J.; Pat Hayes; Peter P. Yim
Subject: promised material on expressing the mappings Cyc to other ontologies
This is the material I promised during the conference
call; about
mapping Cyc to other ontologies. Please circulate
or post this, as
appropriate.
There are two different sorts of "mapping to other
ontologies/schemata"
that we do from Cyc:
* mapping between terms in Cyc and terms in other
ontologies
* mapping between terms in Cyc and terms in databases (or
SUBMIT-able
web pages). We call this SKSI for Semantic
Knowledge Source Integration.
The next two sections treat these two asterisked
processes in turn.
To be clear, this is not HOW we figure out the mappings,
this is the
vocabulary of CycL predicates, collections, etc. we
employ to express
that mapping, express it formally, express it in ways
that the standard
Cyc inference engine can make use of information stored
in those alien
sources, e.g., as part of discharging some
sub-sub-...-problem.
There is a third sort of mapping which is NOT covered
here, namely where
we are exporting entire sets of Cyc assertions to
languages (e.g., OWL)
which are strictly less expressive than Cyc's
representation language,
and need to have special ways of flattening HOL
assertions into FOPC, or
full FOPC assertions into description logic, etc.
--------------------------------------------------------
Here's a summary of the predicates etc. we use to map to
other ontologies:
---------------------------------------------------------------------------
exact mapping:
(synonymousExternalConcept CYC-TERM SOURCE
SOURCE-TERM)
CYC-TERM maps to SOURCE-TERM in SOURCE (ontology/representation
system
where SOURCE-TERM 'lives')
E.g.,
(synonymousExternalConcept DrinkingMug
LSCOMObjectAndSituationOntology "Mug")
;;;;;;;;;;;;;;;
close mapping:
(overlappingExternalConcept CYC-TERM SOURCE
SOURCE-TERM)
CYC-TERM closely maps to SOURCE-TERM in SOURCE.
This relation might hold under a number of
conditions:
* SOURCE-TERM conflates some concepts (so that there are
two or more Cyc-term entries with which it
overlaps),
* SOURCE-TERM is a predicate with the same meaning as
CYC-TERM, but
has a different arg-order.
* SOURCE-TERM denotes a predicate with different (usually
tighter)
argument constraints than CYC-TERM (due to certain
contextual assumptions
built into SOURCE ontology).
So overlappingExternalConcept assertions are sometimes
inferred from
more specific mapping predicates, e.g.,
;;;;
Different Arg Constraints:
(synonymousExternalPredWRTTypes CYC-PRED SOURCE
S-PRED TYPE1 TYPE2)
* S-PRED denotes a specialization of CYC-PRED, where the
specialization is defined by the tighening of the 1st and
2nd
arguments as specified by TYPE1 and TYPE2.
E.g.,
(synonymousExternalPredWRTTypes dateOfDeath
JRC-EMMOntology
"dateDeath" Person Date)
The JRC-EMM ontology only deals with deaths of people
when they deal
with deaths at all, so their "dateDeath"
predicate maps to our
#$dateOfDeath, but is more restrictive (our predicate
relates living
organisms to their death-dates).
;;;;
Different arg-order:
(synonymousExternalPred-Inverse PRED SOURCE STRING)
E.g.,
(synonymousExternalPred-Inverse primeMinister
JRC-EMMOntology
"isPrimeMinisterFor")
(self-explanatory)
;;;;
Combination of arg-order and constraint-variance:
(synonymousExternalPredWRTTypes RELN SOURCE STRING
TYPE1 TYPE2)
E.g.,
(synonymousExternalPredWRTTypes-Inverse children
JRC-EMMOntology
"childOf" Person Person)
The Cyc predicate #$children relates two animals, in
order of parent
to child. JRC-EMM ontology relates two people, in
order of child to
parent.
------------------------------------------------
Here's a summary of the SKSI mapping vocabulary we
employ:
----------------------------------------------------------------
The term #$StructuredKnowledgeSource in the CycL language
denotes the
collection of all knowledge sources which have a well
defined schema
that can be used to access and interpret the source's
data. When a new
database instance is integrated with Cyc, it is denoted
by a unique term
in CycL that is an instance of the collection
#$Database-Physical, a
specialization of #$StructuredKnowledgeSource.
A knowledge source is related to its parts using the CycL
predicate
#$subKS-Direct. We create a instance of
#$DatabaseTable-Physical in CycL
for each table contained within a database and associate
it with the
CycL term for the parent database, relate it to the
parent using
#$subKS-Direct and assert the name of the table.
The literal structure of part or all of a structured
knowledge source is
described by its physical schema, denoted in Cyc by the
collection
#$PhysicalSchema. A physical field is an abstraction of a
column in a
database table. The CycL term #$PhysicalField
denotes the collection of
all physical fields. A physical schema has one physical
field for each
column of a table and bears the same name as the column
it represents. A
physical field is determined uniquely by its associated
physical schema
and name, so we introduce a binary function
#$PhysicalFieldFn which
takes an instance of #$PhysicalSchema as its first
argument and a string
as its second argument. Functional expressions denote
unique instances
of #$PhysicalField. A physical schema is related to its
fields using the
CycL predicate #$physicalFields.
Here are some examples drawn from a mapping of Cyc to the
USGS GNIS
database:
(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn
#$USGS-GNIS-PS "fid"))
(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn
#$USGS-GNIS-PS "name"))
(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn
#$USGS-GNIS-PS "type"))
(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn
#$USGS-GNIS-PS
"state_fips"))
(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn
#$USGS-GNIS-PS
"county_fips"))
Field data types are represented by associating the field
with a CycL
term denoting its datatype:
(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS
"fid") #$Integer)
(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS
"name")
#$CharacterString)
(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS
"type")
#$CharacterString)
(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS
"state_fips")
(#$StringOfLengthFn 2))
(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS
"county_fips")
(#$StringOfLengthFn 3))
The logical schema of a knowledge source (database or
database table) is
the semantic analogue of its physical schema. It
describes how the
content of a table is interpreted in the broader Cyc
ontology. The
collection of all logical schemas is denoted in CycL by
the term
#$LogicalSchema. Typically, one logical schema is
associated with one
physical schema for each database table.
In CycL, a logical field type is a collection in the
ontology and each
instance of #$LogicalField is related to some instance of
#$Collection,
which determines its type. Similar to #$PhysicalFields,
we construct
instances of #$LogicalField using a function,
#$LogicalFieldFn, that
takes the logical schema as its first argument and use
the logical
field's type as its second argument. However since a
table may have
multiple columns that correspond to the same type of
object, just
stating the type alone is not sufficient to uniquely
identify a logical
field within a logical schema. So in addition we use a
unique integer in
the third argument of the function. The choice of the
integer is
arbitrary, as long as it results in a term that is
distinct from the
other logical fields for a schema. In addition, we relate
a logical
schema to its fields using the CycL predicate
#$logicalFields.
(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn #$USGS-GNIS-LS
#$Place
1))
(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn
#$USGS-GNIS-LS
#$ProperNameString 1))
(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn
#$USGS-GNIS-LS
#$CartographicFeatureType 1))
(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn
#$USGS-GNIS-LS
#$State-UnitedStates 1))
(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn
#$USGS-GNIS-LS
#$USCounty 1))
In object oriented databases and many relational
databases, tables often
correspond to natural classes of objects in the world, and
the rows of
such tables may implicitly denote the objects of the
class. In such
cases, the table's primary key provides an identifier
that may be used
to uniquely distinguish between objects of the class, in
addition to
merely distinguishing between rows of a table. We
distinguish such
tables by assigning to them a special type of logical
schema, called an
object defining schema, which is denoted in CycL by the
collection
#$ObjectDefiningSchema, a specialization or subcollection
of the
collection #$LogicalSchema.
For example, the gnis.type column contains coded values
that describe
the type of cartographic feature represented by a feature
in the
database. The physical field corresponding to this column
is
(#$PhysicalFieldFn #$USGS-GNIS-PS "type")
and the logical field corresponding to this column is
(#$LogicalFieldFn #$USGS-GNIS-LS #$CartgraphicFeatureType
1)
The correspondence between the values for the physical
field and the
values for the logical field (instances of
#$CartographicFeatureType)
are recorded in the reified mapping
#$USGS-FeatureType-CMLS using the
#$codeMapping predicate. Here is a sample of these
sentences:
(#$codeMapping #$Usgs-FeatureType-CMLS "cave"
#$Cave)
(#$codeMapping #$Usgs-FeatureType-CMLS "other"
#$Place)
(#$codeMapping #$Usgs-FeatureType-CMLS "ruin"
#$RuinedArtifact)
(#$codeMapping #$Usgs-FeatureType-CMLS
"unknown" #$Place)
(#$codeMapping #$Usgs-FeatureType-CMLS "summit"
#$Summit)
(#$codeMapping #$Usgs-FeatureType-CMLS "slope"
#$Slope-Topographical)
(#$codeMapping #$Usgs-FeatureType-CMLS "ridge"
#$Ridge-Hill)
(#$codeMapping #$Usgs-FeatureType-CMLS "ppl"
#$PopulatedPlace)
Finally, the following sentence tells Cyc that the
#$USGS-FeatureType-CMLS reified mapping should be used to
translate the
logical field above:
(#$logicalFieldMapping (#$LogicalFieldFn #$USGS-GNIS-LS
#$CartgraphicFeatureType 1) #$Usgs-FeatureType-CMLS)
Whereas the SQL standard and database management systems
conflate the
two at the conceptual level, we distinguish between a
physical field and
an arbitrary value of the physical field, and between a
logical field
and an abritrary value of the logical field. The values
of a physical
field are called physical field indexicals and are
denoted in CycL by
the collection #$PhysicalFieldIndexical. Similarly, the
values of a
logical field are called logical field indexicals and are
denoted in
CycL by the collection #$LogicalFieldIndexical. Exactly
one physical
field indexical is created for each physical field, and
one logical
field indexical for each logical field. To denote the
instances of
#$PhysicalFieldIndexical and #$LogicalFieldIndexical we
introduce two
additional functions, #$ThePhysicalFieldValueFn and
#$TheLogicalFieldValueFn. They are the indexical
analogues of
#$PhysicalFieldFn and #$LogicalFieldFn and have exactly
the same
argument signature. Physical and logical indexicals are
related to their
schema using the CycL predicates
#$physicalFieldIndexicals and
#$logicalFieldIndexicals respectively, and to their
corresponding
physical and logical fields using the CycL predicates
#$indexicalForPhysicalField<tt> and
<tt>#$indexicalForLogicalField
respectively. The sentences and terms for the USGS GNIS
example fields are:
(#$physicalFieldIndexicals #$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn
#$USGS-GNIS-PS "fid"))
(#$physicalFieldIndexicals #$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn
#$USGS-GNIS-PS "name"))
(#$physicalFieldIndexicals #$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn
#$USGS-GNIS-PS "type"))
(#$physicalFieldIndexicals #$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn
#$USGS-GNIS-PS "state_fips"))
(#$physicalFieldIndexicals #$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn
#$USGS-GNIS-PS "county_fips"))
(#$indexicalForPhysicalField (#$PhysicalFieldFn
#$USGS-GNIS-PS "fid")
(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS
"fid"))
(#$indexicalForPhysicalField (#$PhysicalFieldFn
#$USGS-GNIS-PS "name")
(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS
"name"))
(#$indexicalForPhysicalField (#$PhysicalFieldFn
#$USGS-GNIS-PS "type")
(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS
"type"))
(#$indexicalForPhysicalField (#$PhysicalFieldFn
#$USGS-GNIS-PS
"state_fips") (#$ThePhysicalFieldValueFn
#$USGS-GNIS-PS "state_fips"))
(#$indexicalForPhysicalField (#$PhysicalFieldFn
#$USGS-GNIS-PS
"county_fips") (#$ThePhysicalFieldValueFn
#$USGS-GNIS-PS "county_fips"))
(#$logicalFieldIndexicals #$USGS-GNIS-LS
(#$TheLogicalFieldValueFn
#$USGS-GNIS-LS #$Place 1))
(#$logicalFieldIndexicals #$USGS-GNIS-LS
(#$TheLogicalFieldValueFn
#$USGS-GNIS-LS #$ProperNameString 1))
(#$logicalFieldIndexicals #$USGS-GNIS-LS
(#$TheLogicalFieldValueFn
#$USGS-GNIS-LS #$CartographicFeatureType 1))
(#$logicalFieldIndexicals #$USGS-GNIS-LS
(#$TheLogicalFieldValueFn
#$USGS-GNIS-LS #$State-UnitedStates 1))
(#$logicalFieldIndexicals #$USGS-GNIS-LS
(#$TheLogicalFieldValueFn
#$USGS-GNIS-LS #$USCounty 1))
(#$indexicalForLogicalField (#$LogicalFieldFn
#$USGS-GNIS-LS #$Place 1)
(#$TheLogicalFieldValueFn #$USGS-GNIS-LS #$Place 1))
(#$indexicalForLogicalField (#$LogicalFieldFn
#$USGS-GNIS-LS
#$ProperNameString 1) (#$TheLogicalFieldValueFn
#$USGS-GNIS-LS
#$ProperNameString 1))
(#$indexicalForLogicalField (#$LogicalFieldFn
#$USGS-GNIS-LS
#$CartographicFeatureType 1) (#$TheLogicalFieldValueFn
#$USGS-GNIS-LS
#$CartographicFeatureType 1))
(#$indexicalForLogicalField (#$LogicalFieldFn
#$USGS-GNIS-LS
#$State-UnitedStates 1) (#$TheLogicalFieldValueFn
#$USGS-GNIS-LS
#$State-UnitedStates 1))
(#$indexicalForLogicalField (#$LogicalFieldFn
#$USGS-GNIS-LS #$USCounty
1) (#$TheLogicalFieldValueFn #$USGS-GNIS-LS #$USCounty
1))
The schema translation process begins by relating a
physical schema to
its associated logical schema. This is done using the
CycL predicate
#$logicalPhyscialSchemaMap. For the USGS GNIS table, we
have
(#$logicalPhysicalSchemaMap #$USGS-GNIS-LS
#$USGS-GNIS-PS)
Physical and logical field translations are stated as
relationships
between their corresponding indexical terms. These
relationships
describe explicitly how to manipulate the raw data object
from its
representation as a physical field value to its
representation as a
logical field value, and vice versa. This is acomplished
using two CycL
predicates, #$fieldDecoding and #$fieldEncoding. Below
are the schema
translation templates for the gnis.fid physical and
logical fields that
describe how the data values are translated, using the
physical and
logical field indexicals as placeholders:
(#$fieldDecoding
#$USGS-GNIS-LS
(#$TheLogicalFieldValueFn #$USGS-GNIS-LS #$Place 1)
#$USGS-GNIS-PS
(#$SourceSchemaObjectFn
#$USGS-KS
#$USGS-GNIS-LS
(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS
"fid")))
(#$fieldEncoding
#$USGS-GNIS-PS
(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS
"fid")
#$USGS-GNIS-LS
(#$SourceSchemaObjectIDFn
#$USGS-KS
#$USGS-GNIS-LS
(#$TheLogicalFieldValueFn #$USGS-GNIS-LS
#$Place 1)))
The #$fieldDecoding says, in essence, that to convert a
value for the
gnis.fid into an instance of #$Place, plug it into the
third argument of
the #$SourceSchemaObjectFn term, with the given terms
#$USGS-KS and
#$USGS-GNIS-LS in the second and third arguments
respectively, while the
fieldEncoding says that to convert an instance of #$Place
to a raw data
value that is consistent with the constraints on
gnis.fid,plug it into
the third argument of the #$SourceSchemaObjectIDFn term
with the given
terms #$USGS-KS and #$USGS-GNIS-LS in the second and
third arguments
respectively.
--
----------------------------------
Douglas Lenat
CEO, Cycorp
7718 Wood Hollow Drive, Suite 250
Austin, TX 78731
phone: 512-342-4001
cell: 512-773-1709
email: Doug@xxxxxxx
----------------------------------