Re: [ontology-summit] Ontology driven Data Integration using owl:equival

To:	Ontology Summit 2012 discussion <ontology-summit@xxxxxxxxxxxxxxxx>
From:	David Price <dprice@xxxxxxxxxxxxxxx>
Date:	Sat, 8 Feb 2014 13:53:06 +0000
Message-id:	<C3E2C6D1-87FB-47BE-B533-A4354211A4E7@xxxxxxxxxxxxxxx>

On 7 Feb 2014, at 20:53, Patrick Cassidy <pat@xxxxxxxxx> wrote:

There is a serious problem with the suggested methodology:

> Two ontologies or vocabularies (for instance FOAF and Schema.org) include definitions for the same class (or kind) of entity e.g., an Organization,
> and as a consequence we end up with Web accessible   documents comprised of RDF statements that describe Organizations as instances of foaf:Organization or schemaorg:Organization.
>
> Challenge: How do we get a merged view of all the organizations, irrespective of how they've been described across various RDF documents?
>
> Solution:
>
> 1. Make a mapping/bridge/meta ontology that uses owl:equivalentClass relations to indicate the fact
> that <http://xmlns.com/foaf/0.1/Organization> and <http://schema.org/Organization> are equivalent.
>
> 2. Load the mapping/bridge/meta ontology document into a data management system that's capable
> of applying reasoning and inference to the equivalence claim based on its comprehension of the relation semantics expressed
>
> 3. Access instances of the <http://xmlns.com/foaf/0.1/Organization> classes (e.g., by seeking a description
> of <http://xmlns.com/foaf/0.1/Organization> which should produce a solution that includes subjects
> of instanceOf (rdf:type) relations) -- and this will show a union of all instances of across <http://xmlns.com/foaf/0.1/Organization> and <http://schema.org/Organization>
>
> 4. Reverse the action in step 4 above -- the results should be the same.

   If you create an equivalence mapping between entities in independently developed ontologies and try to “reason” with it in any but the most highly restricted manner, you will almost certainly find unintended inferences, likely logical inconsistency, and potentially a great deal of gibberish.    When you look at the logical specifications of entities of the same name in different ontologies, they are often quite different, even though the intuition for the meanings may be similar. For “organization”, for example, some ontologies have that as a subtype of “group of people”. In some legal jurisdictions, an Organization can exist without any members – i.e., no people. That can lead to logical contradictions if different definitions are equated. I have never seen “process” defined the same way in any two independent ontologies.

   If one only wants to create equivalencies and use that to perform probabilistic or pattern-matching reasoning, that may lead to useful results that can be helpful for the humans who evaluate the results. But don’t expect the kind of accuracy that would be needed to allow the computers to make mission-critical decisions without human intervention.   Or, if one only wants to extract some one or two properties of an entity (e.g. the director of a film), equating “film” in two different ontologies may work as intended.   But extreme caution is advised.

Our experience in enterprise federation situations is that nothing in the logic languages is sufficient for these purposes. It may be our customers have harder problems than have been discussed so far - they need a merged view (i.e. federation) of existing RDBs, RDF databases, Linked Data, spreadsheets, XML, etc. Gartner calls this kind of thing a "Logical Data Warehouse". Based on projects with life sciences organizations, we're just finishing up an initial product called TopBraid Insight as a first step towards addressing these more difficult scenarios.

I'll explain a bit about TBI simply to point out that the world (or at least the enterprise world TQ inhabits) is far more complex than is evident in the simplistic assumption that equivalentClass will do the job. A variety of flexible options that can be applied to individual cases are needed. For TBI we've had to do a lot of different things, for example:

One key requirement we've uncovered is that the data volumes are so large that, from a practical perspective, you will never see all the data once. So a reasoner cannot be the solution. We do bump into some simple cases where subClassOf is sufficient, but would not use equivalentClass for the reasons Pat, Matthew and I have mentioned.

To map the source data classes to classes in the neutral ontology (the integrating ontology), a graphical mapping approach is available called SPINMaps. It's just a layer over SPIN/SPARQL though which is really the engine. If we need some inference, we write SPIN rules rather than using a reasoner to do exactly as Pat suggested - to "reason" in "the most highly restricted manner".

To specify that instances are the same, a couple of approaches we use are:

- A LinkMap is a functional mapping that describes the relationship between two instances of different classes in different data sources based on their property values.

- Unlike a LinkMapping, which is functional, a LinkSet uses static declarations to relate instances of different classes. This is useful in cases where a functional mapping is not possible or becomes too complex. A LinkSet uses a single predicate (usually skos:exactMatch) to relate a subject resource to an object resource. Note that unlike a LinkMap, a LinkSet assumes a symmetrical relationship.

We separate definition of the federation from it's usage to support: 1) the fact that people are interested in different aspects of these federations and 2) the fact that large data volumes means methods for subsetting and lazy loading the data are also required.

- A workbench defines how data from different data sources comes together and becomes available to the users.

- A workspace is an instance of a workbench used by one or more users to explore some data of interest. As users use the workspace to search and browse data from the diverse data sources, the workspace is dynamically populated with the retrieved results. At any given time, there may be multiple workspaces based on the same workbench—each created to answer different questions and containing different slices of data. This way, TBI provides the ability to dynamically create datamarts focused on particular subsets of data. Each TBI workspace is a Logical Data Warehouse. Users can explore the data, and then come back to a workspace at a later point to continue their exploration. They will continue to see results of their previous searches, because this information is now stored locally to the workspace. They can also clear a workspace to start anew.

In summary ... the analogy (because I know MW is a keen sailor) is that when in an ocean of data, it's best to create a harbor with smooth waters where you can do some some nice sailing.

Cheers,

David

     Pat

Patrick Cassidy
MICRA Inc.
cassidy@xxxxxxxxx
1-908-561-3416

From: ontology-summit-bounces@xxxxxxxxxxxxxxxx [mailto:ontology-summit-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Andrea Westerinen
Sent: Friday, February 07, 2014 1:17 PM
To: Ontology Summit 2014 discussion
Subject: Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations

Kingsley, +1 ... Your mapping/bridge/meta ontology is my "integrating ontology". And, you captured the essence extremely well in your demos.

The keys are:
1. Creating the mappings
2. Reasoning with the mappings

Clearly this works over data that is Linked Data or data in ontologies.

--
Andrea Westerinen
T: 425.891.8407
arwesterinen@xxxxxxxxx or andreaw@xxxxxxxxxxx
http://ontolog.cim3.net/cgi-bin/wiki.pl?AndreaWesterinen
organizingknowledge.blogspot.com

On Fri, Feb 7, 2014 at 10:01 AM, Kingsley Idehen <kidehen@xxxxxxxxxxxxxx> wrote:
All,

Starting a new thread based on the theme above to make what I am trying to demonstrate clearer.

Situation:

Schema.org [1] is a collaborative effort aimed as simplifying structured data publication to the Web. As part of this effort, a number of collaborators have collectively produced a number of shared vocabularies under the "schema.org" namespace.

In addition to what's being produced by Schema.org there are a thousands of shared ontologies and vocabularies that have been constructed and published to the Web from a plethora of sources, many of these have been aggregated by services such as LOV (Linked Open Vocabulary) [2] which is basically accentuates the TBox and RBox aspects of the Linked Open Data (LOD) Cloud.

Typical Integration Problem:

Two ontologies or vocabularies (for instance FOAF and Schema.org) include definitions for the same class (or kind) of entity e.g., an Organization, and as a consequence we end up with Web accessible documents comprised of RDF statements that describe Organizations as instances of foaf:Organization or schemaorg:Organization.

Challenge: How do we get a merged view of all the organizations, irrespective of how they've been described across various RDF documents?

Solution:

1. Make a mapping/bridge/meta ontology that uses owl:equivalentClass relations to indicate the fact that <http://xmlns.com/foaf/0.1/Organization> and <http://schema.org/Organization> are equivalent.

2. Load the mapping/bridge/meta ontology document into a data management system that's capable of applying reasoning and inference to the equivalence claim based on its comprehension of the relation semantics expressed

3. Access instances of the <http://xmlns.com/foaf/0.1/Organization> classes (e.g., by seeking a description of <http://xmlns.com/foaf/0.1/Organization> which should produce a solution that includes subjects of instanceOf (rdf:type) relations) -- and this will show a union of all instances of across <http://xmlns.com/foaf/0.1/Organization> and <http://schema.org/Organization>

4. Reverse the action in step 4 above -- the results should be the same.

Live Demo Link:

[1] http://lod.openlinksw.com/describe/?url=""> -- description of <http://xmlns.com/foaf/0.1/Organization> *without inference and reasoning enabled*, so the relations presented are specific to the aforementioned class.

[2] http://lod.openlinksw.com/describe/?url="">-- description of <http://schema.org/Organization> *without inference and reasoning enabled*, so the relations presented are specific ot the aforementioned class .

[3] http://lod.openlinksw.com/describe/?url=""> -- description of <http://xmlns.com/foaf/0.1/Organization> *with inference and reasoning enabled*.

[4] http://lod.openlinksw.com/describe/?url=""> -- description of <http://schema.org/Organization> with *inference and reasoning enabled*.

--

Regards,

Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014
Community Portal: http://ontolog.cim3.net/wiki/

_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014
Community Portal: http://ontolog.cim3.net/wiki/


_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014  
Community Portal: http://ontolog.cim3.net/wiki/     (01)

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, (continued) Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, Uri Shani Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, Ron Wheeler Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, John F Sowa Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, Patrick Cassidy [ontology-summit] Are there primitive concepts? (Was ontology driven integration...), John F Sowa Re: [ontology-summit] Are there primitive concepts? (Was ontology driven integration...), Patrick Cassidy Re: [ontology-summit] Are there primitive concepts?, John F Sowa Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, Ron Wheeler Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, Kingsley Idehen Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, Ron Wheeler Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, David Price <= Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, Kingsley Idehen

Previous by Date:	Re: [ontology-summit] Presenting diverse data sets as linked data - hackathon idea, Mike Bennett
Next by Date:	Re: [ontology-summit] OWl and Knowledge reuse via import and modularization, Gary Berg-Cross
Previous by Thread:	Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, Ron Wheeler
Next by Thread:	Re: [ontology-summit] Ontology driven Data Integration using owl:equivalentClass relations, Kingsley Idehen
Indexes:	[Date] [Thread] [Top] [All Lists]