ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Solving the information federation problem

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>, "simf-rfp@xxxxxxx" <simf-rfp@xxxxxxx>
From: Ed Barkmeyer <edbark@xxxxxxxx>
Date: Thu, 27 Oct 2011 20:10:54 -0400
Message-id: <4EA9F30E.4080800@xxxxxxxx>


David Price wrote:
> There are of course things that organizations can do to start improving 
> the situation, but they have little to do with Ontolog-typical concerns 
> and so I doubt that the Ontolog Forum is the place to 'get on with' this 
> problem.
>
> I think it's pretty clear now that the OMG cannot do it either - as has 
> been proven by the lack of progress on SIMF despite a valiant effort on 
> your part. FWIW it's very hard to push through the OMG 'everything is a 
> meta-model' and 'vested interests' barriers. Luckily, it seems to me 
> that a new language is actually pretty far down the list of important 
> mechanisms/approaches wrt information federation anyway.
>       (01)

Well, I can agree to some extent.  The problem that OMG has in this 
regard is that Cory is pushing for a *standard* that supports 'semantic 
integration tools', and he can't name one.  I pointed out then that, in 
spite of 2 EU FP6 projects and millions of euros invested in this, the 
result was only weak academic tooling, and the three collections of 
tools I saw chose different organizations and different integrating 
mechanisms.  The OMG Telecomm group put out an RFI for the current state 
of the art in semantic integration tools and got only one response, from 
Cory's AESIG.  NIST itself is now on its 4th project in trying to define 
a feasible toolset for some known mediation problems.  Part of the 
difficulty is in agreeing on what the modules should be and do, and part 
of the difficulty is agreeing on an adequate form for the integrating 
model.     (02)

But the main problem is simply that it is easier to build a one-off 
mapping of your business data from representation1 to representation2 
using XSLT or Java, than to learn to use, and use, the tools to create 
the ontology for your business data and the tools to map the XML schemas 
to the ontology and the tools to perform the runtime transformations.  
You have to see a broader, longer-term value to the reference ontology 
to realize any value at all from the extra work.     (03)

And the reference ontology has to be able to capture the rules of usage 
that you will write into your XSLT script.  OWL can't.  RDF can, if the 
tool provider invents enough special vocabulary, but what modeling tool 
will you use to create the RDF ontology?  UML with stereotypes and OCLv2 
can, but it isn't any easier to write OCL than Java.  So there is a 
serious practical barrier to getting /useable/ and /cost-effective/ 
semantic mediation tooling.  And that is why there are not  lots of 
commercial tools.    (04)

Yes, there is enormous value to be realized, IF you can figure out how 
to create it.  We at NIST justify our work in this area as 'research', 
because we have not yet seen a tool set that is even effective, without 
getting into useable or cost-effective.  And OMG has been given to 
understand that the IBM evaluation of the situation is similar.  So I 
applaud Cory's idea that this could be an interesting topic for the 
Ontology Summit, if nothing more than to get a clearer handle on the 
state of the art in semantic mediation in 2012.  The state of the 
practice is nearly non-existent, which is why a standards project is of 
doubtful value.    (05)

> Cory, this problem belongs in the W3C.  I suggested that to you 
> previously, and the events of the past year have made that fact even 
> more clear in my mind - the solution has to be based in Web and Internet 
> standards and technologies.     (06)

That is certainly true, but all of OMG, W3C, OASIS and other bodies are 
working on solutions to various problems based on XML and XML Schema and 
WSDL/SOAP, and all their dialects and add-ons, which is the meaning of 
'Web and Internet standards'.  Then we come to who is actually working 
on solutions using OWL and RDF, and suddenly we have much smaller and 
more scattered contingent, but there are active committees in all of 
those, and all in various states of disorganization.     (07)

I don't see that W3C is a better choice.  The W3C RIF project, for 
example, had the problem of having to work with OWL and having to work 
with SPARQL, because those were the W3C invested technologies, even 
though none of the non-academic rules engines, and at most half the 
academic ones, had anything to do with either one.  (David's employer 
falls into non-academic category; TopQuadrant support for OWL was an 
afterthought.)  In short, going to W3C just begets a different set of 
politics and prejudices.    (08)

The problem is not what technologies to use, or where to do the 
standards work.  The problem is to have a community that has semantic 
mediation tooling and is interested in getting a standard to enable some 
tools to work together.  All of the tool sets I have seen perform the 
entire mediation function.  They need to be able to read XML schemas, 
and ASN.1 schemas (in HL7), and EDI schemas (in many business 
applications), and EXPRESS schemas (in manufacturing and construction), 
and read and write the corresponding standard message forms.  They need 
to have an internal representation for the integrating model (aka 
reference ontology), and they probably rely on some off-the-shelf 
modeling tools to provide the input from which that model is created.  
It may be advantageous to convert UML to OWL or vice versa, and they 
probably need to add UML stereotypes or something the like to mark up 
the incoming model to meet their internal needs for the content of the 
reference ontology.  In addition, they need a runtime capability that is 
based on a central engine with interface and schema plugins on the input 
side and the output side, and the semantic maps and reference ontology 
as inputs.     (09)

Now given that you are building a semantic mediation tool suite, you 
have a list of tool components (which the last draft of the SIMF RFP was 
still not clear on):  reference ontology creation tool, semantic mapping 
creation tool, general runtime conversion engine, semantic mapping tool 
plugins for XML schema, EDI, ASN.1, EXPRESS (according to your target 
market), runtime plugins for the schemas and the corresponding data 
encodings for input and output, and runtime plugins for WSDL/SOAP and 
ebMS, and probably other protocols (again depending on target market).  
If you build all the tool components as part of your suite, the only 
standards you need are the existing standards for the schemas and the 
data forms.    (010)

There are already standards for all schemas and encodings, and there are 
probably open source libraries for reading both and writing encodings.  
Unless you want to standardize the Java APIs for that, there is no 
opportunity for standards there.      (011)

Similarly, you probably want the reference ontology creation tool to be 
some off-the-shelf product of a vendor that does that kind of thing 
well, and spits out some standard form, like UML XMI or OWL/RDF or RDF 
or CLIF (if John Sowa has convinced anyone).  Alternatively, you could 
probably use one of these do-it-yourself graphical DSL tools to make 
your own tool, and then use your own internal reference ontology format 
as the direct output of your tool.  In either case, however, you don't 
need a standard, unless you need a new language.    (012)

Finally, you will need a tool that can take an exchange schema in its 
left hand, and a reference ontology in its right hand, and enable the 
domain expert to define the links between the model elements, path to 
path.  This is the critical Semantic Mediation Rules Tool.  And you need 
to define two sets of links -- one is an interpretation rule: data to 
concept; the other is an encoding rule: concept to data.  They are not 
always symmetric, because the starting points are usually different.  
The Semantic Mediation Rules Tool needs to record and export the mapping 
rules it generates, because those rulesets are the critical input to the 
runtime engine -- the Mediator.  If you expect that one organization 
will build a Semantic Mediation Rules Tool that can be used by someone 
else's Mediator, you need a standard for the representation of semantic 
mediation rules.  If not, then not.  Does any commercial or academic 
project not envisage building both the Rules Tool and the Mediator as 
part of its toolkit?  None that I know of.  Why would you?  Is there any 
reason to create a standard for communication between my Rules Tool and 
my Mediator?  Not only is it my design choice, it is my IP, and I can 
improve my capabilities by improving the capabilities of that interface 
whenever I discover a new and exciting feature that I can add.  And I 
might find it useful to patent my design.  The last thing I want is a 
standard.    (013)

In summary, there is the issue of defining a standard architecture, but 
we would have to do that before trying to standardize any of the 
interfaces.  It strikes me that a useful output of the OMG AESIG would 
be the whitepaper that clearly defines the semantic mediation 
architecture and assesses the opportunities for standardization, rather 
than an RFP for several not clearly necessary standards.    (014)

I see only three areas for interface standardization:
 - the form of the reference ontology that is input and presented at the 
interface between the human knowledge engineer and the reference 
ontology capturing tool.  It is probably a combined graphical and text 
form, a la UML+OCL, or OWL+RDF.
 - the form of the reference ontology that is exported by the capturing 
tool for use by other tools, including but not limited to the Semantic 
Mediation Rules tool and the Mediator.  It is probably an RDF dialect.  
What all is captured here, or can be captured here, has some impact on 
the capabilities and possible behaviors of the Semantic Mediation Rules 
tool.  So this interface may be an important part of the tool-builder 
IP.  If there were enough experience to know what all might be useful to 
express, you could get agreement on a standard, even though most tools 
would only be able to use some of it.  Most importantly, however, a 
standard in this area that is not just a UML profile, or something the 
like, would require the toolsmith to build some kind of back-end for the 
off-the-shelf UML or OWL tool that is the primary ontology input tool.  
And I would expect that many semantic mediation toolkits might just 
assume that a UML or OWL tool can be used and will generate the standard 
XMI or RDF formats.
 - the form of mediation rules that is input and presented at the 
interface between the knowledge engineer and the Semantic Mediation 
Rules tool.  This is an area that is by no means ripe for 
standardization, because the workings of this tool are very different in 
various designs.  Part of the rules generation process can be automated, 
and part of it requires human input, and how much is which, and how the 
automation is enabled, and what algorithms it uses, and how complex the 
executable rules for the Mediator can be, are all design decisions.  
This a primary area of tool-builder IP.    (015)

So, IMO, the big question is what the form of the reference ontology 
is.  Do we need a new language for creating them?  Do we need a set of 
RDF additions to OWL, or a UML Profile for Reference Ontologies?  If we 
don't need a new language at all, then we already have all the standards 
we need, and we need to get some experience with commercial tools.    (016)

If we need a new language, then we also need to standardize its export 
form.  A UML profile can be processed by off-the-shelf UML tools and the 
models can be exported in XMI.  Similarly, an RDF add-on to OWL might be 
supported by an extension to an existing OWL tool and exported as 
described in OWL/full.  (Clark/Parsia are already doing this kind of 
thing with Pellet.)  CLIF may be a desirable export form for some 
mediation tools, but it is a highly undesirable input form for knowledge 
engineers working with domain experts.  Domain experts can glean most of 
the content of UML and graphical OWL models with a little experience, 
but CLIF is about as intelligible as OWL/RDF or XMI or Old Church 
Slavonic.  A wholly new language requires a new set of tools and 
standards for both ends; a CLIF tool requires a new input form.  (One of 
the failures of OMG SBVR is that it exemplifies a possibly viable input 
form for rules and definitions that it does not standardize, and then 
standardizes an output form that merely competes with OCL and CLIF/IKL 
-- a new kind of train on existing tracks with no doors for the passengers.)    (017)

And at this time in history, I think the standardization of input to the 
Mediation Rules generator would be a mistake.  There is no agreement on 
how to generate such rules, or even what capabilities of the Mediator 
they must drive.  So, let us by all means have conferences and 
whitepapers on the subject, but please not as standards development 
projects.    (018)

> The Goverment Linked Open Data WG and the 
> RDB2RDF WG are examples of practical things happening in the W3C that 
> will hopefully make some real progress possible. More of that kind of 
> thing, perhaps more focused at this particular problem, seems like the 
> only practical way forward to me.
>       (019)

Linked Open Data is the latest in a long line of webheaded information 
integration technologies, which is in no way related to semantic 
mediation, as far as I can tell.  RDB2RDF is a knowledge-free technical 
transformation of SQL relational database schemas to RDF Schema + SQL 
RDF dialect.  The object seems to be to allow the implementors of 
triple-store databases to use real industrial information that is stored 
in relational data management systems in a predictable way.  It is 
almost the antithesis of semantic mediation, in which the objective is 
to relate the database-engineered SQL schema to a knowledge-engineered 
domain ontology.  But it is the case that some mediation tools take 
exactly RDB2RDF approach to the semantic mapping process, and similar 
projects use XML Schema as the basis.  And let us not forget that Cory 
is working the OMG MOF2RDF standard to make standard RDF export forms 
for UML models and BPMN models, etc., as RDF Schema + MOF RDF dialect.    (020)

This is exactly why W3C is not a better place.  I don't think we want 
semantic integration standards to be strongly influenced by RDB2RDF or 
Linked Open Data, any more than we want them to be influenced by MOF and 
SBVR and UML.    (021)

I suggest that we can make better progress by getting a whitepaper out 
there that identifies the architecture, standardizes a component and 
interface nomenclature, discusses the state of the art in mediation 
technology and the opportunities for standardization.  And I strongly 
agree that the Ontolog Summit could contribute to the 'state of the art 
in mediation technology' part, which is critical to the assessment of 
opportunities for standardization.  The AESIG has been so busy trying to 
generate acceptable RFPs that it has lost sight of its primary value as 
an Architecture Board SIG -- to provide education on the technology and 
guidance on the development of a program of work in this area.     (022)

-Ed    (023)

P.S.  In spite of NIST's strong interest in semantic mediation, we 
(primarily I, no surprise there) have been a thorn in Cory's side since 
the beginning of the SIMF RFP effort.  But I believe the suggestion for 
a workshop topic for the Ontolog Summit is a much more valuable step, on 
the way to the whitepaper that would form the basis for any kind of 
standardization plan, and by-the-by serve as a reference terminology for 
the emerging papers on the subject.  Part of the reason why the EU had 3 
different INTEROP projects doing semantic mediation (all differently) is 
that none of them used the same terms to describe what they were doing.    (024)

-- 
Edward J. Barkmeyer                        Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                Cel: +1 240-672-5800    (025)

"The opinions expressed above do not reflect consensus of NIST, 
 and have not been reviewed by any Government authority."    (026)



> Cheers,
> David
>
> On 10/27/2011 4:33 PM, Cory Casanave wrote:
>   
>> Thanks Peter,
>> I have posted a suggestion on the ontology summit page as you suggested. I 
>would also be happy to explore a tread on the topic and have therefor changed 
>the title.  The initial message, below, can serve as a problem statement.
>>
>> I would like to point out one clear fact: That with all the great work, 
>tools, research and products available - the problem of information federation 
>still exists and is getting worse.  What we have now is either not working or 
>not resonating.  We don't need and probably can't produce a 100% solution - we 
>don't have to.  Making a 20% improvement in our ability to federate 
>information and exchange data would be of immense benefit to companies, 
>governments and society.  I think we can do better than 20% and part of that 
>is accepting that the 100% solutions are not currently practical.  We have to 
>make the solution set (of which ontologies are only a part), tractable and 
>practical for widespread adoption - that has not been the track record so far.
>>
>> This is a multi-billion dollar opportunity to address a pervasive and 
>recognized problem.  Let's get on with it.
>>
>> Regards,
>> Cory Casanave
>>
>> -----Original Message-----
>> From: peter.yim@xxxxxxxxx [mailto:peter.yim@xxxxxxxxx] On Behalf Of Peter Yim
>> Sent: Wednesday, October 26, 2011 7:00 PM
>> To: Cory Casanave
>> Cc: steve.ray@xxxxxxxxxx; [ontolog-forum]
>> Subject: [OT] process clarification [was - Re: [ontolog-forum] Some Grand 
>Challenge proposal ironies]
>>
>> Cory,
>>
>>
>>     
>>> [CoryC] An area of interest to me and many of our clients is solving the 
>information federation problem. ...
>>>       
>> [ppy]  A good topic indeed. However ...
>>
>> 1. if you are suggesting that folks discuss this "information federation 
>problem" on [ontolog-forum], please consider starting a new thread (with a 
>proper subject line) and move forward from there; or
>>
>> 2. if you are suggesting we (you addressing to Steve, following a remark of 
>his regarding the Ontology Summit indicates that this might have been your 
>purpose), it would be helpful if you condense the proposition to, say, a short 
>theme/title, with a brief (short
>> paragraph) description and post it to the 
>http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit/Suggestions
>> page (like what Christopher has done), and then, via a message post, 
>highlight that suggestions, and take it forward similarly.
>>
>> (That would help allow this thread to stay on point to discuss what 
>Christopher is trying here.)
>>
>>
>> Thanks&  regards. =ppy
>>     
>
>
>       (027)



_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (028)

<Prev in Thread] Current Thread [Next in Thread>