[Top] [All Lists]

Re: [ontolog-forum] Solving the information federation problem

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Ed Barkmeyer <edbark@xxxxxxxx>
Date: Fri, 28 Oct 2011 16:18:31 -0400
Message-id: <4EAB0E17.1090208@xxxxxxxx>

David Price wrote:
> Hi Ed,
> As usual ... Wow! I may try to dissect this at some point, but am in the
> middle of a big information federation project with a reference ontology
> using semantic mediation languages that may become W3C standards ... so
> may not have time:-)
> My one sentence summary of your email 'It's too early to make standards
> for information federation as a whole, and there are already enough
> standards in place for the components' ... about right?      (01)

I think perhaps we are on different wavelengths.  "Information 
federation" can mean a great many things -- everything from hrefs in 
HTML to the Grand Universal Ontology for Medicine or Google searches.  
If you pick a particular kind of information federation, and perhaps 
particular categories of information collections, there may or may not 
be sufficient standards to enable it.    (02)

I was specifically addressing the question of 'semantic mediation' -- 
creating a reference ontology to convert data sets into facts about a 
possible world that ontology can be used to describe, and then 
converting (some of) those facts into a data set using a different 
structure and representation.  It is the kind of thing the distributed 
database tools did in the 1980s, using an integrated or federated 
conceptual schema as a "reference ontology", and the kind of thing EAI 
tools of the 1990s did, using some 'integrated object model' or 
'integrated information model' as the "reference ontology".   When XML 
became the lingua franca of roll-your-own interface standards, James 
Clark developed XSLT as the substitute for a semantic mediation 
language.  The XSLT program implements the software engineer's 
understanding of the semantic equivalence as a direct transform from 
data set to data set.  But at the same time, assorted projects have been 
trying to do both federation -- building a common knowledge base -- and 
mediation -- transforming information from one shape to another -- using 
reference ontologies of the OWL kind, or of the Cyc kind, or of the CLIF 
kind, or their own ontology language.    (03)

In my experience, there is no agreement on the mechanism and practice 
for relating an XML Schema to a reference ontology for the same domain.  
So we can't make a standard for what the knowledge engineer sees in 
participating in the semi-automated process of creating the mapping -- 
the mapping language.  We have already standardized the input -- the XML 
Schema form and the RO form.  And standardizing what the output looks 
like -- the mapping exchange format -- is not necessary unless you are 
expecting to one supplier to provide the semi-automated mapping tooling 
and another supplier to provide the mapping implementation engine that 
actually converts the data.  And no one does that.    (04)

> I mostly agree
> with that. However, the question I was addressing  (which may not be the
> question in which you are interested, but seems of interest to Cory) was
> 'If in the next 5 years there need to be standards in support of
> information federation, where should they be placed?' and nothing in
> this reply does anything but strengthen the view that the W3C is the
> most sensible answer.     (05)

UML is a standard that supports 'information federation', and so are 
HTML,  XML Schema and SQL, and people have been doing it, in various 
ways, for 30 years.  So I suggest that you guys need to be a little 
clearer on the definition and scope of the term 'information 
federation', and then maybe you will be right about W3C.  In another 
way, concentrated efforts to produce domain ontologies, like OAGIS v9 or 
STEP AP 242, might be more valuable than any W3C supporting technology.  
But the problem with most of that work is that it is captured in XML 
Schema, because that was the successful W3C modeling language, not OWL.  
And we can argue about where upper ontologies, like DOLCE, Cyc and SUMO 
fit in to the 'information federation' picture.    (06)

If you are trying to semi-automatically construct a domain ontology 
using UML models, EXPRESS models and XML Schemas and their embedded 
annotations, or other natural language text, as sources, that is an 
entirely different 'information federation' problem from relating 
existing data sets to a reference ontology, or to each other.  And if 
you are trying to construct one consistent ontology from several 
ontologies or models made from different viewpoints, that is yet another 
information integration problem.    (07)

One of the political problems of the OMG AESIG is that each of the 
members has his own favorite bag of separable model integration 
problems, and they have tried for two years to find a common solution -- 
one ring to rule them all.  That having failed twice, the next step 
might be to try to sort out and document the different problems, as 
stated in the viewpoints of the proposers, and then try to do a little 
viewpoint model integration in developing a plan of action.    (08)

Similarly, if Cory wants the Ontology Summit to discuss 'information 
federation', that will be 7 distantly related discussions.  If he wants 
to discuss ontologizing information/data models, or linking ontologies, 
or semantic mediation, those are much more focused problem spaces.    (09)

> You may not approve of all the W3C WG activities
> as they don't fit into the NIST semantic mediation architecture view,
> but these activities are pushing in the right direction are are doing
> far more to add to the components of a solution than other standards
> activities of which I'm aware.
>       (010)

Well, I don't know what 'information integration' problem you are solving. 
I assumed that Cory was talking about 'semantic mediation', because that 
was the gist of the SIMF RFP.
There are lots of other choices.    (011)

> FWIW ... I'm not even convinced a semantic mediation tools suite is
> what's actually required. In order to make progress, making things
> clearly specified for humans (but computer navigable) independent of any
> specific tool suite could be is a big step in the right direction, so
> something more like the loosely-coupled, but semantics-driven IDIOM
> framework might be a more reasonable place to start.
>       (012)

Making things 'clearly specified for humans' is a very important 
point.   The W3C "web heads" are all about making information easy for 
humans to find and relate.  The assumption is that you can link anything 
to anything and the browser will have a plug-in that allows whatever you 
linked to to be displayed to the human reader.  The "data heads", like 
me, are all about making information useable by software to make 
decisions without involving humans in the decision process unnecessarily 
(or in some cases, necessarily not involving humans in the decision 
process).  Data heads care what form the linked information is in, 
because the software has to know the nature of the information and be 
able to extract the information from that form into a form the software 
can use.  Those goals are not closely related.  The value of ontologies 
for webheads is that they make it easier to filter the information 
sources automatically (a decision process) and display what the human 
wants to see.  The value of ontologies for data heads is that they 
represent knowledge and intent in a form that software can use directly 
to make intelligent decisions, or to assist humans in making those 
decisions.  For web heads, ontology is a potentially useful tool that 
supports some of their goals; for data heads, ontology (in some form) is 
a vital aspect of the mechanism for achieving most of their goals.  Web 
heads see the human interface as providing knowledge to the human; data 
heads see the human interface as acquiring knowledge from the human.  
(W3C has both; OMG only has data heads.)    (013)

-Ed    (014)

> Anyway, back to my real job...
> Cheers,
> David
> On 10/28/2011 1:10 AM, Ed Barkmeyer wrote:
>> David Price wrote:
>>> There are of course things that organizations can do to start improving
>>> the situation, but they have little to do with Ontolog-typical concerns
>>> and so I doubt that the Ontolog Forum is the place to 'get on with' this
>>> problem.
>>> I think it's pretty clear now that the OMG cannot do it either - as has
>>> been proven by the lack of progress on SIMF despite a valiant effort on
>>> your part. FWIW it's very hard to push through the OMG 'everything is a
>>> meta-model' and 'vested interests' barriers. Luckily, it seems to me
>>> that a new language is actually pretty far down the list of important
>>> mechanisms/approaches wrt information federation anyway.
>> Well, I can agree to some extent.  The problem that OMG has in this
>> regard is that Cory is pushing for a *standard* that supports 'semantic
>> integration tools', and he can't name one.  I pointed out then that, in
>> spite of 2 EU FP6 projects and millions of euros invested in this, the
>> result was only weak academic tooling, and the three collections of
>> tools I saw chose different organizations and different integrating
>> mechanisms.  The OMG Telecomm group put out an RFI for the current state
>> of the art in semantic integration tools and got only one response, from
>> Cory's AESIG.  NIST itself is now on its 4th project in trying to define
>> a feasible toolset for some known mediation problems.  Part of the
>> difficulty is in agreeing on what the modules should be and do, and part
>> of the difficulty is agreeing on an adequate form for the integrating
>> model.
>> But the main problem is simply that it is easier to build a one-off
>> mapping of your business data from representation1 to representation2
>> using XSLT or Java, than to learn to use, and use, the tools to create
>> the ontology for your business data and the tools to map the XML schemas
>> to the ontology and the tools to perform the runtime transformations.
>> You have to see a broader, longer-term value to the reference ontology
>> to realize any value at all from the extra work.
>> And the reference ontology has to be able to capture the rules of usage
>> that you will write into your XSLT script.  OWL can't.  RDF can, if the
>> tool provider invents enough special vocabulary, but what modeling tool
>> will you use to create the RDF ontology?  UML with stereotypes and OCLv2
>> can, but it isn't any easier to write OCL than Java.  So there is a
>> serious practical barrier to getting /useable/ and /cost-effective/
>> semantic mediation tooling.  And that is why there are not  lots of
>> commercial tools.
>> Yes, there is enormous value to be realized, IF you can figure out how
>> to create it.  We at NIST justify our work in this area as 'research',
>> because we have not yet seen a tool set that is even effective, without
>> getting into useable or cost-effective.  And OMG has been given to
>> understand that the IBM evaluation of the situation is similar.  So I
>> applaud Cory's idea that this could be an interesting topic for the
>> Ontology Summit, if nothing more than to get a clearer handle on the
>> state of the art in semantic mediation in 2012.  The state of the
>> practice is nearly non-existent, which is why a standards project is of
>> doubtful value.
>>> Cory, this problem belongs in the W3C.  I suggested that to you
>>> previously, and the events of the past year have made that fact even
>>> more clear in my mind - the solution has to be based in Web and Internet
>>> standards and technologies.
>> That is certainly true, but all of OMG, W3C, OASIS and other bodies are
>> working on solutions to various problems based on XML and XML Schema and
>> WSDL/SOAP, and all their dialects and add-ons, which is the meaning of
>> 'Web and Internet standards'.  Then we come to who is actually working
>> on solutions using OWL and RDF, and suddenly we have much smaller and
>> more scattered contingent, but there are active committees in all of
>> those, and all in various states of disorganization.
>> I don't see that W3C is a better choice.  The W3C RIF project, for
>> example, had the problem of having to work with OWL and having to work
>> with SPARQL, because those were the W3C invested technologies, even
>> though none of the non-academic rules engines, and at most half the
>> academic ones, had anything to do with either one.  (David's employer
>> falls into non-academic category; TopQuadrant support for OWL was an
>> afterthought.)  In short, going to W3C just begets a different set of
>> politics and prejudices.
>> The problem is not what technologies to use, or where to do the
>> standards work.  The problem is to have a community that has semantic
>> mediation tooling and is interested in getting a standard to enable some
>> tools to work together.  All of the tool sets I have seen perform the
>> entire mediation function.  They need to be able to read XML schemas,
>> and ASN.1 schemas (in HL7), and EDI schemas (in many business
>> applications), and EXPRESS schemas (in manufacturing and construction),
>> and read and write the corresponding standard message forms.  They need
>> to have an internal representation for the integrating model (aka
>> reference ontology), and they probably rely on some off-the-shelf
>> modeling tools to provide the input from which that model is created.
>> It may be advantageous to convert UML to OWL or vice versa, and they
>> probably need to add UML stereotypes or something the like to mark up
>> the incoming model to meet their internal needs for the content of the
>> reference ontology.  In addition, they need a runtime capability that is
>> based on a central engine with interface and schema plugins on the input
>> side and the output side, and the semantic maps and reference ontology
>> as inputs.
>> Now given that you are building a semantic mediation tool suite, you
>> have a list of tool components (which the last draft of the SIMF RFP was
>> still not clear on):  reference ontology creation tool, semantic mapping
>> creation tool, general runtime conversion engine, semantic mapping tool
>> plugins for XML schema, EDI, ASN.1, EXPRESS (according to your target
>> market), runtime plugins for the schemas and the corresponding data
>> encodings for input and output, and runtime plugins for WSDL/SOAP and
>> ebMS, and probably other protocols (again depending on target market).
>> If you build all the tool components as part of your suite, the only
>> standards you need are the existing standards for the schemas and the
>> data forms.
>> There are already standards for all schemas and encodings, and there are
>> probably open source libraries for reading both and writing encodings.
>> Unless you want to standardize the Java APIs for that, there is no
>> opportunity for standards there.
>> Similarly, you probably want the reference ontology creation tool to be
>> some off-the-shelf product of a vendor that does that kind of thing
>> well, and spits out some standard form, like UML XMI or OWL/RDF or RDF
>> or CLIF (if John Sowa has convinced anyone).  Alternatively, you could
>> probably use one of these do-it-yourself graphical DSL tools to make
>> your own tool, and then use your own internal reference ontology format
>> as the direct output of your tool.  In either case, however, you don't
>> need a standard, unless you need a new language.
>> Finally, you will need a tool that can take an exchange schema in its
>> left hand, and a reference ontology in its right hand, and enable the
>> domain expert to define the links between the model elements, path to
>> path.  This is the critical Semantic Mediation Rules Tool.  And you need
>> to define two sets of links -- one is an interpretation rule: data to
>> concept; the other is an encoding rule: concept to data.  They are not
>> always symmetric, because the starting points are usually different.
>> The Semantic Mediation Rules Tool needs to record and export the mapping
>> rules it generates, because those rulesets are the critical input to the
>> runtime engine -- the Mediator.  If you expect that one organization
>> will build a Semantic Mediation Rules Tool that can be used by someone
>> else's Mediator, you need a standard for the representation of semantic
>> mediation rules.  If not, then not.  Does any commercial or academic
>> project not envisage building both the Rules Tool and the Mediator as
>> part of its toolkit?  None that I know of.  Why would you?  Is there any
>> reason to create a standard for communication between my Rules Tool and
>> my Mediator?  Not only is it my design choice, it is my IP, and I can
>> improve my capabilities by improving the capabilities of that interface
>> whenever I discover a new and exciting feature that I can add.  And I
>> might find it useful to patent my design.  The last thing I want is a
>> standard.
>> In summary, there is the issue of defining a standard architecture, but
>> we would have to do that before trying to standardize any of the
>> interfaces.  It strikes me that a useful output of the OMG AESIG would
>> be the whitepaper that clearly defines the semantic mediation
>> architecture and assesses the opportunities for standardization, rather
>> than an RFP for several not clearly necessary standards.
>> I see only three areas for interface standardization:
>>   - the form of the reference ontology that is input and presented at the
>> interface between the human knowledge engineer and the reference
>> ontology capturing tool.  It is probably a combined graphical and text
>> form, a la UML+OCL, or OWL+RDF.
>>   - the form of the reference ontology that is exported by the capturing
>> tool for use by other tools, including but not limited to the Semantic
>> Mediation Rules tool and the Mediator.  It is probably an RDF dialect.
>> What all is captured here, or can be captured here, has some impact on
>> the capabilities and possible behaviors of the Semantic Mediation Rules
>> tool.  So this interface may be an important part of the tool-builder
>> IP.  If there were enough experience to know what all might be useful to
>> express, you could get agreement on a standard, even though most tools
>> would only be able to use some of it.  Most importantly, however, a
>> standard in this area that is not just a UML profile, or something the
>> like, would require the toolsmith to build some kind of back-end for the
>> off-the-shelf UML or OWL tool that is the primary ontology input tool.
>> And I would expect that many semantic mediation toolkits might just
>> assume that a UML or OWL tool can be used and will generate the standard
>> XMI or RDF formats.
>>   - the form of mediation rules that is input and presented at the
>> interface between the knowledge engineer and the Semantic Mediation
>> Rules tool.  This is an area that is by no means ripe for
>> standardization, because the workings of this tool are very different in
>> various designs.  Part of the rules generation process can be automated,
>> and part of it requires human input, and how much is which, and how the
>> automation is enabled, and what algorithms it uses, and how complex the
>> executable rules for the Mediator can be, are all design decisions.
>> This a primary area of tool-builder IP.
>> So, IMO, the big question is what the form of the reference ontology
>> is.  Do we need a new language for creating them?  Do we need a set of
>> RDF additions to OWL, or a UML Profile for Reference Ontologies?  If we
>> don't need a new language at all, then we already have all the standards
>> we need, and we need to get some experience with commercial tools.
>> If we need a new language, then we also need to standardize its export
>> form.  A UML profile can be processed by off-the-shelf UML tools and the
>> models can be exported in XMI.  Similarly, an RDF add-on to OWL might be
>> supported by an extension to an existing OWL tool and exported as
>> described in OWL/full.  (Clark/Parsia are already doing this kind of
>> thing with Pellet.)  CLIF may be a desirable export form for some
>> mediation tools, but it is a highly undesirable input form for knowledge
>> engineers working with domain experts.  Domain experts can glean most of
>> the content of UML and graphical OWL models with a little experience,
>> but CLIF is about as intelligible as OWL/RDF or XMI or Old Church
>> Slavonic.  A wholly new language requires a new set of tools and
>> standards for both ends; a CLIF tool requires a new input form.  (One of
>> the failures of OMG SBVR is that it exemplifies a possibly viable input
>> form for rules and definitions that it does not standardize, and then
>> standardizes an output form that merely competes with OCL and CLIF/IKL
>> -- a new kind of train on existing tracks with no doors for the passengers.)
>> And at this time in history, I think the standardization of input to the
>> Mediation Rules generator would be a mistake.  There is no agreement on
>> how to generate such rules, or even what capabilities of the Mediator
>> they must drive.  So, let us by all means have conferences and
>> whitepapers on the subject, but please not as standards development
>> projects.
>>> The Goverment Linked Open Data WG and the
>>> RDB2RDF WG are examples of practical things happening in the W3C that
>>> will hopefully make some real progress possible. More of that kind of
>>> thing, perhaps more focused at this particular problem, seems like the
>>> only practical way forward to me.
>> Linked Open Data is the latest in a long line of webheaded information
>> integration technologies, which is in no way related to semantic
>> mediation, as far as I can tell.  RDB2RDF is a knowledge-free technical
>> transformation of SQL relational database schemas to RDF Schema + SQL
>> RDF dialect.  The object seems to be to allow the implementors of
>> triple-store databases to use real industrial information that is stored
>> in relational data management systems in a predictable way.  It is
>> almost the antithesis of semantic mediation, in which the objective is
>> to relate the database-engineered SQL schema to a knowledge-engineered
>> domain ontology.  But it is the case that some mediation tools take
>> exactly RDB2RDF approach to the semantic mapping process, and similar
>> projects use XML Schema as the basis.  And let us not forget that Cory
>> is working the OMG MOF2RDF standard to make standard RDF export forms
>> for UML models and BPMN models, etc., as RDF Schema + MOF RDF dialect.
>> This is exactly why W3C is not a better place.  I don't think we want
>> semantic integration standards to be strongly influenced by RDB2RDF or
>> Linked Open Data, any more than we want them to be influenced by MOF and
>> SBVR and UML.
>> I suggest that we can make better progress by getting a whitepaper out
>> there that identifies the architecture, standardizes a component and
>> interface nomenclature, discusses the state of the art in mediation
>> technology and the opportunities for standardization.  And I strongly
>> agree that the Ontolog Summit could contribute to the 'state of the art
>> in mediation technology' part, which is critical to the assessment of
>> opportunities for standardization.  The AESIG has been so busy trying to
>> generate acceptable RFPs that it has lost sight of its primary value as
>> an Architecture Board SIG -- to provide education on the technology and
>> guidance on the development of a program of work in this area.
>> -Ed
>> P.S.  In spite of NIST's strong interest in semantic mediation, we
>> (primarily I, no surprise there) have been a thorn in Cory's side since
>> the beginning of the SIMF RFP effort.  But I believe the suggestion for
>> a workshop topic for the Ontolog Summit is a much more valuable step, on
>> the way to the whitepaper that would form the basis for any kind of
>> standardization plan, and by-the-by serve as a reference terminology for
>> the emerging papers on the subject.  Part of the reason why the EU had 3
>> different INTEROP projects doing semantic mediation (all differently) is
>> that none of them used the same terms to describe what they were doing.
> --
> Managing Director and Consultant
> TopQuadrant Limited. Registered in England No. 05614307
> UK +44 7788 561308
> US +1 336-283-0606
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>       (015)

Edward J. Barkmeyer                        Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                Cel: +1 240-672-5800    (016)

"The opinions expressed above do not reflect consensus of NIST, 
 and have not been reviewed by any Government authority."    (017)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (018)

<Prev in Thread] Current Thread [Next in Thread>