[ontology-summit] [Requirements] FW: promised material on expressing the

To:	"Ontology Summit 2008" <ontology-summit@xxxxxxxxxxxxxxxx>
From:	"Obrst, Leo J." <lobrst@xxxxxxxxx>
Date:	Thu, 3 Apr 2008 19:37:34 -0400
Message-id:	<9F771CF826DE9A42B548A08D90EDEA8002F4FA25@xxxxxxxxxxxxxxxxx>

Forward from Doug based on questions asked at the OOR panel session today.

Thanks, Doug!

Here were the questions, responses:

[14:30] LeoObrst: Question to Doug: Given Cyc's long experience with such matters, can you provide the OOR group with what you would suggest as a "small set of inter-ontology alignment relations"? Which are necessary and which are desirable?

[14:45] doug lenat: Response to Leo: Yes, I would be happy to provide the set of (surprisingly few) predicates we use to state those inter-ontology correspondences. I will send that out and/or post it today or tomorrow.

See the chat session for other interesting discussion.

Leo

-----Original Message-----
From: Doug Lenat [mailto:doug@xxxxxxx]
Sent: Thursday, April 03, 2008 5:46 PM
To: Obrst, Leo J.; Pat Hayes; Peter P. Yim
Subject: promised material on expressing the mappings Cyc to other ontologies

This is the material I promised during the conference call; about

mapping Cyc to other ontologies. Please circulate or post this, as

appropriate.

There are two different sorts of "mapping to other ontologies/schemata"

that we do from Cyc:

* mapping between terms in Cyc and terms in other ontologies

* mapping between terms in Cyc and terms in databases (or SUBMIT-able

web pages). We call this SKSI for Semantic Knowledge Source Integration.

The next two sections treat these two asterisked processes in turn.

To be clear, this is not HOW we figure out the mappings, this is the

vocabulary of CycL predicates, collections, etc. we employ to express

that mapping, express it formally, express it in ways that the standard

Cyc inference engine can make use of information stored in those alien

sources, e.g., as part of discharging some sub-sub-...-problem.

There is a third sort of mapping which is NOT covered here, namely where

we are exporting entire sets of Cyc assertions to languages (e.g., OWL)

which are strictly less expressive than Cyc's representation language,

and need to have special ways of flattening HOL assertions into FOPC, or

full FOPC assertions into description logic, etc.

--------------------------------------------------------

Here's a summary of the predicates etc. we use to map to other ontologies:

---------------------------------------------------------------------------

exact mapping:

(synonymousExternalConcept CYC-TERM SOURCE SOURCE-TERM)

CYC-TERM maps to SOURCE-TERM in SOURCE (ontology/representation system

where SOURCE-TERM 'lives')

E.g.,

(synonymousExternalConcept DrinkingMug

LSCOMObjectAndSituationOntology "Mug")

;;;;;;;;;;;;;;;

close mapping:

(overlappingExternalConcept CYC-TERM SOURCE SOURCE-TERM)

CYC-TERM closely maps to SOURCE-TERM in SOURCE.

This relation might hold under a number of conditions:

* SOURCE-TERM conflates some concepts (so that there are

two or more Cyc-term entries with which it overlaps),

* SOURCE-TERM is a predicate with the same meaning as CYC-TERM, but

has a different arg-order.

* SOURCE-TERM denotes a predicate with different (usually tighter)

argument constraints than CYC-TERM (due to certain contextual assumptions

built into SOURCE ontology).

So overlappingExternalConcept assertions are sometimes inferred from

more specific mapping predicates, e.g.,

;;;;

Different Arg Constraints:

(synonymousExternalPredWRTTypes CYC-PRED SOURCE S-PRED TYPE1 TYPE2)

* S-PRED denotes a specialization of CYC-PRED, where the

specialization is defined by the tighening of the 1st and 2nd

arguments as specified by TYPE1 and TYPE2.

E.g.,

(synonymousExternalPredWRTTypes dateOfDeath JRC-EMMOntology

"dateDeath" Person Date)

The JRC-EMM ontology only deals with deaths of people when they deal

with deaths at all, so their "dateDeath" predicate maps to our

#$dateOfDeath, but is more restrictive (our predicate relates living

organisms to their death-dates).

;;;;

Different arg-order:

(synonymousExternalPred-Inverse PRED SOURCE STRING)

E.g.,

(synonymousExternalPred-Inverse primeMinister JRC-EMMOntology

"isPrimeMinisterFor")

(self-explanatory)

;;;;

Combination of arg-order and constraint-variance:

(synonymousExternalPredWRTTypes RELN SOURCE STRING TYPE1 TYPE2)

E.g.,

(synonymousExternalPredWRTTypes-Inverse children JRC-EMMOntology

"childOf" Person Person)

The Cyc predicate #$children relates two animals, in order of parent

to child. JRC-EMM ontology relates two people, in order of child to

parent.

------------------------------------------------

Here's a summary of the SKSI mapping vocabulary we employ:

----------------------------------------------------------------

The term #$StructuredKnowledgeSource in the CycL language denotes the

collection of all knowledge sources which have a well defined schema

that can be used to access and interpret the source's data. When a new

database instance is integrated with Cyc, it is denoted by a unique term

in CycL that is an instance of the collection #$Database-Physical, a

specialization of #$StructuredKnowledgeSource.

A knowledge source is related to its parts using the CycL predicate

#$subKS-Direct. We create a instance of #$DatabaseTable-Physical in CycL

for each table contained within a database and associate it with the

CycL term for the parent database, relate it to the parent using

#$subKS-Direct and assert the name of the table.

The literal structure of part or all of a structured knowledge source is

described by its physical schema, denoted in Cyc by the collection

#$PhysicalSchema. A physical field is an abstraction of a column in a

database table. The CycL term #$PhysicalField denotes the collection of

all physical fields. A physical schema has one physical field for each

column of a table and bears the same name as the column it represents. A

physical field is determined uniquely by its associated physical schema

and name, so we introduce a binary function #$PhysicalFieldFn which

takes an instance of #$PhysicalSchema as its first argument and a string

as its second argument. Functional expressions denote unique instances

of #$PhysicalField. A physical schema is related to its fields using the

CycL predicate #$physicalFields.

Here are some examples drawn from a mapping of Cyc to the USGS GNIS

database:

(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn #$USGS-GNIS-PS "fid"))

(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn #$USGS-GNIS-PS "name"))

(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn #$USGS-GNIS-PS "type"))

(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn #$USGS-GNIS-PS

"state_fips"))

(#$physicalFields #$USGS-GNIS-PS (#$PhysicalFieldFn #$USGS-GNIS-PS

"county_fips"))

Field data types are represented by associating the field with a CycL

term denoting its datatype:

(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS "fid") #$Integer)

(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS "name")

#$CharacterString)

(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS "type")

#$CharacterString)

(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS "state_fips")

(#$StringOfLengthFn 2))

(#$fieldDataType (#$PhysicalFieldFn #$USGS-GNIS-PS "county_fips")

(#$StringOfLengthFn 3))

The logical schema of a knowledge source (database or database table) is

the semantic analogue of its physical schema. It describes how the

content of a table is interpreted in the broader Cyc ontology. The

collection of all logical schemas is denoted in CycL by the term

#$LogicalSchema. Typically, one logical schema is associated with one

physical schema for each database table.

In CycL, a logical field type is a collection in the ontology and each

instance of #$LogicalField is related to some instance of #$Collection,

which determines its type. Similar to #$PhysicalFields, we construct

instances of #$LogicalField using a function, #$LogicalFieldFn, that

takes the logical schema as its first argument and use the logical

field's type as its second argument. However since a table may have

multiple columns that correspond to the same type of object, just

stating the type alone is not sufficient to uniquely identify a logical

field within a logical schema. So in addition we use a unique integer in

the third argument of the function. The choice of the integer is

arbitrary, as long as it results in a term that is distinct from the

other logical fields for a schema. In addition, we relate a logical

schema to its fields using the CycL predicate #$logicalFields.

(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn #$USGS-GNIS-LS #$Place

1))

(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn #$USGS-GNIS-LS

#$ProperNameString 1))

(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn #$USGS-GNIS-LS

#$CartographicFeatureType 1))

(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn #$USGS-GNIS-LS

#$State-UnitedStates 1))

(#$logicalFields #$USGS-GNIS-LS (#$LogicalFieldFn #$USGS-GNIS-LS

#$USCounty 1))

In object oriented databases and many relational databases, tables often

correspond to natural classes of objects in the world, and the rows of

such tables may implicitly denote the objects of the class. In such

cases, the table's primary key provides an identifier that may be used

to uniquely distinguish between objects of the class, in addition to

merely distinguishing between rows of a table. We distinguish such

tables by assigning to them a special type of logical schema, called an

object defining schema, which is denoted in CycL by the collection

#$ObjectDefiningSchema, a specialization or subcollection of the

collection #$LogicalSchema.

For example, the gnis.type column contains coded values that describe

the type of cartographic feature represented by a feature in the

database. The physical field corresponding to this column is

(#$PhysicalFieldFn #$USGS-GNIS-PS "type")

and the logical field corresponding to this column is

(#$LogicalFieldFn #$USGS-GNIS-LS #$CartgraphicFeatureType 1)

The correspondence between the values for the physical field and the

values for the logical field (instances of #$CartographicFeatureType)

are recorded in the reified mapping #$USGS-FeatureType-CMLS using the

#$codeMapping predicate. Here is a sample of these sentences:

(#$codeMapping #$Usgs-FeatureType-CMLS "cave" #$Cave)

(#$codeMapping #$Usgs-FeatureType-CMLS "other" #$Place)

(#$codeMapping #$Usgs-FeatureType-CMLS "ruin" #$RuinedArtifact)

(#$codeMapping #$Usgs-FeatureType-CMLS "unknown" #$Place)

(#$codeMapping #$Usgs-FeatureType-CMLS "summit" #$Summit)

(#$codeMapping #$Usgs-FeatureType-CMLS "slope" #$Slope-Topographical)

(#$codeMapping #$Usgs-FeatureType-CMLS "ridge" #$Ridge-Hill)

(#$codeMapping #$Usgs-FeatureType-CMLS "ppl" #$PopulatedPlace)

Finally, the following sentence tells Cyc that the

#$USGS-FeatureType-CMLS reified mapping should be used to translate the

logical field above:

(#$logicalFieldMapping (#$LogicalFieldFn #$USGS-GNIS-LS

#$CartgraphicFeatureType 1) #$Usgs-FeatureType-CMLS)

Whereas the SQL standard and database management systems conflate the

two at the conceptual level, we distinguish between a physical field and

an arbitrary value of the physical field, and between a logical field

and an abritrary value of the logical field. The values of a physical

field are called physical field indexicals and are denoted in CycL by

the collection #$PhysicalFieldIndexical. Similarly, the values of a

logical field are called logical field indexicals and are denoted in

CycL by the collection #$LogicalFieldIndexical. Exactly one physical

field indexical is created for each physical field, and one logical

field indexical for each logical field. To denote the instances of

#$PhysicalFieldIndexical and #$LogicalFieldIndexical we introduce two

additional functions, #$ThePhysicalFieldValueFn and

#$TheLogicalFieldValueFn. They are the indexical analogues of

#$PhysicalFieldFn and #$LogicalFieldFn and have exactly the same

argument signature. Physical and logical indexicals are related to their

schema using the CycL predicates #$physicalFieldIndexicals and

#$logicalFieldIndexicals respectively, and to their corresponding

physical and logical fields using the CycL predicates

#$indexicalForPhysicalField<tt> and <tt>#$indexicalForLogicalField

respectively. The sentences and terms for the USGS GNIS example fields are:

(#$physicalFieldIndexicals #$USGS-GNIS-PS (#$ThePhysicalFieldValueFn

#$USGS-GNIS-PS "fid"))

(#$physicalFieldIndexicals #$USGS-GNIS-PS (#$ThePhysicalFieldValueFn

#$USGS-GNIS-PS "name"))

(#$physicalFieldIndexicals #$USGS-GNIS-PS (#$ThePhysicalFieldValueFn

#$USGS-GNIS-PS "type"))

(#$physicalFieldIndexicals #$USGS-GNIS-PS (#$ThePhysicalFieldValueFn

#$USGS-GNIS-PS "state_fips"))

(#$physicalFieldIndexicals #$USGS-GNIS-PS (#$ThePhysicalFieldValueFn

#$USGS-GNIS-PS "county_fips"))

(#$indexicalForPhysicalField (#$PhysicalFieldFn #$USGS-GNIS-PS "fid")

(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS "fid"))

(#$indexicalForPhysicalField (#$PhysicalFieldFn #$USGS-GNIS-PS "name")

(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS "name"))

(#$indexicalForPhysicalField (#$PhysicalFieldFn #$USGS-GNIS-PS "type")

(#$ThePhysicalFieldValueFn #$USGS-GNIS-PS "type"))

(#$indexicalForPhysicalField (#$PhysicalFieldFn #$USGS-GNIS-PS

"state_fips") (#$ThePhysicalFieldValueFn #$USGS-GNIS-PS "state_fips"))

(#$indexicalForPhysicalField (#$PhysicalFieldFn #$USGS-GNIS-PS

"county_fips") (#$ThePhysicalFieldValueFn #$USGS-GNIS-PS "county_fips"))