OntologySummit2014: (Track-C) "Overcoming Ontology Engineering Bottlenecks" Synthesis    (43DY)

Track Co-champions: KrzysztofJanowicz, PascalHitzler, MatthewWest    (43DZ)

Background    (43ZC)

Ontology Engineering is the development and use of ontology in any form as all or part of some system. This includes such areas as data integration, data mining, expert systems, data semantics and reasoning. Sometimes there are barriers to the use of ontologies because of the cost of development and deployment or the timeliness of being able to deliver solutions. This track aims to seek out the bottlenecks that represent the current barriers to use of ontologies and point towards the solutions or work towards the resolution of those bottlenecks.    (43ZD)

Mission    (43ZE)

To identify bottlenecks that hinder the large-scale development and usage of ontologies and identify ways to overcome them.    (43ZF)

Examples    (43ZG)

Plan    (43ZW)

see also: OntologySummit2014_Overcoming_Ontology_Engineering_Bottlenecks_CommunityInput    (43E2)


During the last session Track C session on Bottlenecks in Ontology Engineering, we were asked some questions. These questions are given below together with responses during the session and offline.    (4BRH)

1. What are the lessons learned from in-the-wild ontology engineering projects?    (4BRI)

Developing an OWL ontology has the same degree of difficulty as any other data modeling exercise (i.e. RDBs, ISO EXPRESS, E/R, UML and OWL require language and tool expertise to do anything real). The hard problems are the same in most cases : understanding the requirements and generating clear, accurate but concise definitions everyone agrees.    (4BRJ)

The big benefits of OWL ontologies are that the more accurately reflect "what is" than other data models, that the underlying technology is so flexible that it enables very quick proof-of-concept and/or testing and it also allows throwing together almost anything and then fixing it up as you go.    (4BRK)

If you've not tested your ontology against real data, it is definitely "wrong". If you have testing your ontology against real data, it less wrong but still wrong. Plan improvements over time into your project, even once the apps are operational.    (4BRL)

2. How do challenges related to cultural and motivational issues relate to technical issues, e.g., tool support?    (4BRM)

A key issue I see is the separation of ontology and the software apps that use them, and the skills required to cross that chasm. It's hard to get software developers to understand ontologies and it's hard to get ontologists to understand the needs of software developers (e.g the "human readable URI" discussion). We are a tool and solution vendor and have chosen the approach of fitting ontology development tools into a larger IDE for software development. Our view is that building ontologies is nice, but delivers very little business value compared with building complete Semantic Applications.    (4BRN)

People have been told ontology development is very hard and costly (which is not true as a proportion of overall project costs, and timely development delivers benefits in any case). Once they have it in their heads, it's very hard to convince them otherwise. In any engineering or science discipline, what they've learned to do their job is orders of magnitude harder than ontology development. This adds to the challenge of making very robust tools that are simpler to use. They key is to make people understand that *with knowledge transfer* they can pick up enough to be productive quite quickly ... we run into many situations where short sighted people ignore that knowledge transfer, thinking they are saving money, and then they pay the price over and over again as the project progresses.    (4BRO)

3. How to get community buy-in?    (4BRP)

Buy-in of specific organisations is far more important, it is very hard to directly convince a community of anything. However, convince a few individual organisations and once they are successful, others will follow if only because of a fear of being left behind. Some large organizations (e.g. DoD have a community that follow them so orgs like that are great candidates to be the lead). Still it is important not to dictate models of the world.    (4BRQ)

4. What are the tradeoffs between expressiveness vs. pragmatics?    (4BRR)

Two kinds of ontology have different requirements.    (4BRS)

The first sort I would call a descriptive ontology, where the purpose is to as accurate as possible to how some domain is, not so much for reasoning, but for documentation. In this situation expressiveness is everything. If you cannot say something that is true, then that is a severe limitation.    (4BRT)

The second sort is aimed at solving a specific problem. This is likely a subset of some descriptive ontology (if such exists) where some specific constraints apply, which may enable more efficient reasoning to take place, or indeed make reasoning possible/practical.    (4BRU)

There is only a problem, in my view, if we try to insist that there is only one type of ontology for a domain, rather than potentially more than one, with relationships between them.    (4BRV)

Expressiveness is primary when you don't know the specific questions to be answered (e.g. in a data integration app). You can always transform to a less-express form when you finally do know the questions, but you have lost too much to easily go the other way if you start with the less-expressive ontology.    (4BRW)

5. Who will develop all the ontologies we would ideally need?    (4BRX)

Individual organisations will develop ontologies *they* need, not what others need. If they choose to share those openly that's great, but most commercial enterprises will not do so as they see them as a business advantage over their competitors. There are, of course, cases where standards need to be created and I expect most shared ontologies will come from standards bodies, consortia or national institutes over time. I'd say that there will eventually be foundations upon which you can build what you need, but nobody will every "develop all the ontologies we would ideally need". Focus on identifying and supporting those foundations is a good first step. For any domain of publicly available data (e.g. units of measure, company registration data, country codes), there ought to be an identifiable authoritative source. I would hope that those authoritative sources would eventually understand their responsibility to develop these ontologies. In many cases these authoritative sources will be public administration bodies, or standardisation bodies. This would at least be better than several bodies developing, say, Unit of Measure ontologies, as is the case at present. → Why would this need an identifiable authoritative source? E.g., the whole success history of the Web is the lack of such source. I think that is the wrong perspective. The success of the web is that anyone can be an authoritative source, not just those who control e.g. the media. Perhaps the confusion is what it means to be an authoritative source. It really means a first-hand source. So someone who was there, for an account of some event, is more authoritative than someone who reports what someone who was there said. But a lot of data is created intentionally, such as country boundaries. Do we want to look at what Joe Bloggs says the country boundaries are, or what the UN says the country boundaries are? Why do we even want to see what Joe Bloggs thinks about this? It just adds confusion. Of course, in a boundary dispute, the governments of the countries disputing the boundary are also authoritative sources for their own territorial claims.    (4BRY)

6. What is the role of crowd-sourcing?    (4BRZ)

None at all in the industries like engineering and life science industries. In life sciences, for example, even the areas of interest are confidential. I'm sure there are others where it has an interesting role ... particularly where vocabulary rather than ontology are primary.    (4BS0)

On the other hand crowd-sourcing created some of the most used ontologies on the Web, e.g., in the geo-domain.    (4BS1)

7. What is the state-of-the art with respect to quality control?    (4BS2)

Test, test, test ... exactly as with any other software artefact. Therefore, the state of the art is automated testing driven by robust requirements and testing tools. We do not manage to do this properly for every app, but we do set up our ontology projects exactly as if they are normal software development projects and build in test cases, testing processes and validation/QA infrastructure even if we don't use it everywhere. In our best case of this set up completely, we can literally push a button and have a dashboard go green when all the automated test cases pass, which for our customer that means "acceptance test" is successful and we're ready to deploy to production.    (4BS3)

In every app that will go operational, we obviously have the "production" deployment and a separate "testing" deployment environment.    (4BS4)

There are some things that tools can help with, like logical consistency, but overall fitness for purpose is a human endeavour, and likely will be for some time. I hope with increasing computer assistance.    (4BS5)

8. How is the industry addressing ontology engineering bottlenecks and what are the technological solutions available on the market today?    (4BS6)

I have found that the more you treat ontology development as part of a larger software app project, where requirements management, testing, software change control, etc. are used - the better. In 90+ percent of cases where project problems/delays have occurred it's been a lack of clear requirements, changed requirements or customer-lack of understanding their requirements that have been the root cause. The "work" is not that hard if requirements are complete. The work is hard if you're given an undocumented XML Schema as the data requirements for your app (and that does happen).    (4BS7)

Addressing ontology engineering bottlenecks can be approached in several ways. We think the following are important for RDF/OWL development.    (4BS8)

Being a software company I guess it's obvious that I'll add that the TopBraid Suite is a technical solution available on the market today. TopBraid sits over eclipse so they should be mentioned as well.    (4BSB)

Given that I said ontology development is really a component of software development, I will add that Jena, github, JIRA, SpiraTest, SOAPUI, Confluence, Google Docs, MySQL, GoToMeeting and Skype are all components the larger technical solution for distributed semantic app and ontology development.    (4BSC)

9. How much (deep) semantics do customers really need?    (4BSD)

There is no single answer, it depends on the industry. For example - Engineering : quite a lot. Life sciences : medium. Publishing : very little as they focus on vocabularies. The priority is identity (same name (ID) for the same thing across those that need to share information). → so what about the type level? Having data about the same car does not mean the data is compatible. I'm not sure what you mean here. At the type level, you want a common identifier for a type just as at the instance level. I also do not see how data about the same thing can be incompatible. I can have two pieces of data about a car where one says it is red and the other that it is blue, but either one of these is wrong, or they refer to the car at different times, neither of these makes the data incompatible though, so what does it mean for data about one thing to be incompatible?    (4BSE)

Some other points that came out during the session are…    (4BSF)

Which ontology tool do we use?    (4BV3)

Starting from excel (as per Oscar's slide 5) is often done and can be very useful, BUT things go awry when people forget that semantics are not explicit or enforced in entry. For example, I've seen folks send a group of domain experts off to build an initial concept capture using an excel template. The results vary widely in how different groups interpret the semantics of the template, and there isn't anything in excel-as-development-environment to help. Working *with* an ontologist, it's not so bad, as that person can be on the lookout for semantic drift. So Excel is a start, not the end. ROO ( available at: http://sourceforge.net/projects/confluence/ ) is a tool specifically designed to work with non Ontology-savvy audience.    (4BSG)

How do we reuse other ontologies?    (4BV4)

Start working with experts so that they provide their definitions, and get agreements on those Decide on reuse when you know what your requirements are. We need to remember that reuse is not an end in itself, but a possible means of delivering a solution quicker and cheaper. However, whilst reuse is not an end in itself, if there are good things to leverage it would help get towards standardization. In addition if one finds that something is not reusable, stating the defects helps the field. Reuse can reduce the cost because you do not have to redevelop. It can also help increase quality, reuse tends to get rid of bugs. Finally, if you have integration requirements across applications, then using the same ontology for both will reduce the costs of interfacing. These are all however ends, which reuse alone is not. We should not forget that it is not only about reusing other ontologies, but also allowing that the one that you create can be reused (e.g., in my examples, across the open data portals community in Spain). "Software engineers tend to have preference for 'their own' solutions". This generalizes way beyond SWE or data engineers or engineers as a whole. It more or less true of most of us.    (4BSH)

The methodology tells me to…    (4BV5)

Rec: use an agile approach, based on sets of competency questions for each sprint There's a step to create between competency questions and user stories create competency questions from user stories (as an instrument designer I want to be able to be able to represent calibration data    (4BSI)

Large groups work more slowly    (4BV6)

Create a small team of experts (5?), who have the confidence of the larger group. Rec4: Avoid non-experts, and use all experts from the same level http://scienceblogs.com/effectmeasure/2009/01/15/the-right-or-wrong-size-for-a/ ... < 20, /= 8?    (4BSJ)

But these ontologies to reuse are in English    (4BV7)

Many ontologies intended for reuse are designed in English and it is assumed all users will use English – this is not valid. It is pragmatic that IDs should be in the language of the developer, since this helps the development and debugging process. IDs should be hidden from end users, who should be able to choose the language for the labels they see I want my ontology to do inferences… Just work with text patterns, and guide them to write good term definitions. Multiple projects over many years now have also found a sweet spot in form-based or diagram-based entry tools that are customized by an ontologist, for particular sets of SMEs & elicitation cases, and generate the formal ontology under the hood without showing it to the SMEs. This can be less lossy. Owlapi fixes a lot of broken stuff behind the curtain. Working to make these fixes more noisy in version 4. Can this be Controlled Natural Language (CNL)? ACE? Some find ACE to be too controlled and requiring obvious info to be useful for normal people. Using a more ambiguous grammar, with semantic disambiguation would be better for most, but editor support made a big difference. (For entry. Comprehension was good). Editor support could make a huge difference. Also need reverse verbalization support. If only there was a common sense knowledge base to start from -    (4BV1)

	https://github.com/Kaljurand/owl-verbalizer    (4BV2)

ROO is better than ACE ( http://sourceforge.net/projects/confluence/ ). We have developed it in Leeds :-) EdBarkmeyer of NIST and FabianNeuhaus (when he was at NIST) were working on a controlled language on top of Common Logic (and a CL reasoner). I don't know the final state of their effort. Published version of TobiasKuhn's survey of CNL is finally out : http://www.mitpressjournals.org/doi/abs/10.1162/COLI_a_00168 Final report on NIST effort (RECON) is at http://www.nist.gov/manuscript-publication-search.cfm?pub_id=911267    (4BSK)

I want my ontology to be light-weight…    (4BV8)

Rec: again, text patterns are the best option to follow here The ontology is done, but is it good? It’s ok to run the reasoner, but that won’t tell you enough. Go for other non-logical checks (e.g., use the Oops! Pitfall scanner)    (4BSL)

== How do I tell others how to use the ont? Simple documentation (in HTML, in Word), with simple examples, with a link to the revised competency questions, and a simple diagram!!    (4BSM)

Automation    (4BV9)

Two of our presentations gave examples of automation to overcome what would otherwise be bottlenecks. Support of mundane tasks is a key way Dhaval Thakker "Modelling Cultural Variations in Interpersonal Communication for Augmented User Generated Content JohannesTrame presenting on behalf of PeterHaase .. "Developing Semantic Applications with the Information Workbench – Aspects of Ontology Engineering"    (4BSN)

Domain and range    (4BVA)

Regarding the domain & range disuse view: I have run into this occasionally, and think it is bad practice and is based on a miss-diagnosis... The underlying problem is that the domain and range are set more restrictively than is really the case. Not specifying domain and range is recommended as a supposed fix for the frequent occurrence of properties that are not represented at a correct and consistent level of generality. It is sometimes as simple as the name of the property being too general (e.g. "controls" instead of "controllsFinancially"). Sometimes it is more complicated... An appropriate correction, at minimum, is to apply a bit of discipline in identifying what is the specificity of the property intended, naming and labelling it in a way that reflects that and setting a domain and range appropriately to that. It is also good practice to evaluate whether you can define a narrow property that you need immediately as a subPropertyOf a general property that already exists or that you can also create. This helps to define your specific property more clearly, as well as creating or connecting to reusable content. Given the intended meaning of a concept, it should surely set the domain (and maybe range) which corresponds to the meaning of the concept, e.g. a property that is explicitly about contracts should have a domain of Contract. But this requires imagination so that when you think about the meaning of a property you think about all the things it can be a property of and all the kinds of thing it can be framed in terms of - creating a sub-property or a restriction as appropriate for the concept you were originally thinking of. Why are domain/range constraints so problematic? They arise quite naturally from any UML class diagram … These mistakes of over constraining domain and range are routinely made in UML diagrams, with relationships being stated at a lower level of abstraction than is really true. For example, an ontology for equipment, may say that one type of equipment must have another type of equipment as a part, but there are other things than equipment for which this is true. The problem is worst in OWL because people frequently misunderstand the effect of domain and range there. I have only seen this disuse recommendation there, perhaps because it is harder there (than in more expressive languages) to say what you mean to say about domain and range. This is because in OWL they have an inferential semantics and most non-DL conceptual modellers do not know that and think of them as constraints. This makes their usage difficult and often problematic. The constraint vs type-inference consequences are a big source of confusion. It is exacerbated by the difficulty of creating the constraint-like d/r in OWL, versus other languages. In some languages, there are simply alternative properties to use depending on which type of assertion you mean to make (see schema:domainIncludes or Cycl arg constraints for example. There's also N-BOXes which are attempts to add NAF to OWL see: http://trowl.eu/ FIBO started out with what's on the corresponding UML class diagrams, and created a deep subsumption hierarchy of properties. This wasn't ideal for OWL usage since in many cases the multiple properties represented the same meaning with some changes to range. The balance we are trying to aim for is to have a separate property only when there is an identifiably new meaning in play. However if I'm honest we haven't achieved that in the current version (someone decided to promote loads of properties to have no domain or range!!) You can use Events and States as classes, both for NLP and other uses, and so will have a Stative like Possess, which is generic but has local property restrictions for generic thematic participants (doing the job of domains/ranges), then have more specific events/states under these with more specialized property restrictions. The point is that the discussion of domain/range is part of the ontological analysis phase of ontology design, but that it is not some new concept that is foreign to someone who knows UML class diagrams. See this G+ post from BernardVatant this morning, and the related comments (on domain range specification in LOV vocabularies) https://plus.google.com/114406186864069390644/posts/D3kkqNCoQZ9 You can conclude what is a range/domain from a restriction, but without at least saying what is the domain/range of a property, how can you relate concepts with one another? So I'd say that domain/range is a minimum to imply some structure on an ontology.    (4BSO)

 maintained by the Track co-champions ... please do not edit    (43E3)