OntologySummit2014: (Track-C) "Overcoming Ontology Engineering Bottlenecks" Community Input workspace (44E6)
Track Co-champions: KrzysztofJanowicz, PascalHitzler, MatthewWest (44E7)
Background (44E8)
Ontology Engineering is the development and use of ontology in any form as all or part of some system. This includes such areas as data integration, data mining, expert systems, data semantics and reasoning. Sometimes there are barriers to the use of ontologies because of the cost of development and deployment or the timeliness of being able to deliver solutions. This track aims to seek out the bottlenecks that represent the current barriers to use of ontologies and point towards the solutions or work towards the resolution of those bottlenecks. T (44E9)
Mission (44EA)
To identify bottlenecks that hinder the large-scale development and usage of ontologies and identify ways to overcome them. T (44EB)
Examples (44EC)
- a) Bottlenecks include: (44FG)
- Ontology engineering processes that are time consuming, (44FH)
- Social, cultural, and motivational issues (44FI)
- Modeling axioms or knowledge representation language fragments that cause difficulties in terms of an increase in reasoning complexity or reducing the reuseability of ontologies (44FJ)
- The identification of areas and applications that would most directly benefit from ontologies but have not yet considered their use and development. (44FK)
- b) Potential Solutions include: (44FL)
- Tools and techniques, (44FM)
- Research findings and methods, guidelines, documentation, and best practice, (44FN)
- Automation (44FO)
- the combination of inductive and deductive methods to scale the creation of axioms (44FP)
- The development of a set of reusable patterns that can ease ontology development and alignment (44FQ)
- The identification of purpose-driven modeling granularities that provide sufficient semantics without over-engineering (44FY)
- Lessons learned from ontologies that are seeing wide adoption (44FR)
- The development of tutorials and other educational materials (44FS)
- c) Pre-requisites (44FT)
- The track is not only concerned with the outline of possible resolutions to the bottlenecks identified but also with the identification pre-requisites to addressing the challenges, which might include agreements that need to be reached, or capabilities to be developed. (44FU)
Plan (44EW)
- 1. Examine the processes to develop and use of ontology like artefacts in various contexts and identify where the weight of effort falls. (44FV)
- 2. Look for opportunities to simplify or automate problem processes. (44FW)
- 3. Develop an outline to debottleneck the process, and identify any pre-requisites. (44FX)
see also: OntologySummit2014_Overcoming_Ontology_Engineering_Bottlenecks_Synthesis (44ED)
Community Input Solicited (44EE)
Please add your input as one-liners or short paragraphs, in bullets below, and make sure your include your name and date at the end for attribution, tracking and following-up purposes. Thanks. -Track-C co-chairs (44EF)
- Ref. thread started by MatthewWest on 2013.01.18 (44EG)
- What is it that takes a lot of time and effort? (44EH)
- Broadly speaking, education and team buy-in. Most people don't understand ontologies well, and getting the supporting team to buy-in and build towards the success of the project can become time consuming and distracting. You really need strong leadership that is committed to project. I've also found the Semantic Web understanding of ontologies to actually be a hindrance for certain classes of applications. Re-education in terms of what is actually possible is sometimes an additional obstacle. (46MX)
- There are 2 tasks that are rather time-consuming; (i) the extraction of the knowledge from Subject-Matter experts (SMEs), and (ii) the explanation of the model to developers using it. (46MY)
- Well, for starters the subject is complex, and so far nobody has achieved this lifecycle information integration. Furthermore the ISO procedures are tedious (but worth it), and the technology is "bleeding edge". Standardization is finding a balance between large ego's, commercial politics, short-term thinking, hard-to-make paradigm shifts, and for the most lack of funding. (46N6)
- (46NA)
- Organizing ontologies for "efficient" processing. Avoiding combinatorial explosion when using machines that were designed to do arithmetic, not categories. (46NB)
- Accommodating N different perspectives on the meaning of a term and its relation to other terms. Need a way of signifying "-nyms" and webs of -nyms. (46MZ)
- (46NC)
- Identifying the scope of knowledge to be represented and the context of use or application. (46ND)
- Defining Concepts is somewhat easier than defining useful relationships that structure the ontology model. (46NE)
- Refining the ontology during development to satisfy logical consistency (46NF)
- Modifying an ontology to capture expanding knowledge while ensuring logical consistency. (46N0)
(44EI)
- What is it that is very expensive? (44EJ)
- Access to SME's. The higher the skill and importance of the SME, the more difficult it is to get their time. Also, the lack of sophisticated ontology tooling also means that significant effort needs to be directed towards (46NL)
- The extraction of the knowledge is expensive as most SMEs have an "ideal" vision of what their knowledge is and tend to differ from the actions / reasoning they actually use. For instance, doctors will answer questions theoretically when they're reasoning while consulting patients tend to be different. Amount of Labor involved. (46NH)
- Cost of IT system for moving the data and resolving the alternatives. (46NI)
- Refining and reviewing the ontology to satisfy requirements (46NJ)
- Integrating an ontology to an existing enterprise and software architecture (46NK)
- What is it that is held up because of a lack of scarce resources? (44EL)
- Generally, ontology development is bottlenecked because of access to SME's and access to software developers that need to provide adequate infrastructure. Moreover, ontology deployment when using functionality beyond the SemWeb stack is also hindered. Formal, computational ontologies in general are not well developed. For example, if you want to deploy an ontology-based application that can reason on natural language questions such as "Who is standing behind you?", "Who passed through this corridor in the last X hours?" and so on, you don't necessarily want to use a symbolic reasoner. Bindings into alternative reasoning algorithms and evaluation frameworks are still quite crude, and require a lot of wheel re-invention. This both slows down design-deployment time, drives up costs, and increases the overall risks of the project. (46NM)
- Discovery of new and better ways to discover, express and process ontologies. Most current human 'resources' are too intellectually invested in current rules and tools (which are not adequate). (46NN)
- An overarching data architecture with long term evolution and application of consistently defined concepts that can be reused with evolving and new services and applications (46NO)
- > Access to SME's. The higher the skill and importance of the SME, the more difficult it is to get their time.... (46NP)
- I agree. But I believe that we need *radically* different tools. The SMEs should do their work in *their* preferred languages and notations. They should *never* be asked to learn anybody else's notations, conventions, or interfaces. (46NQ)
- > Generally, ontology development is bottlenecked because of access to SME's and access to software developers that need to provide adequate infrastructure... (46NR)
- I agree. But the solution is to get the information from the same sources and tools that the SMEs themselves read, write, and use. (46NS)
- > Formal, computational ontologies in general are not well developed... Bindings into alternative reasoning algorithms and evaluation frameworks are still quite crude, and require a lot of wheel re-invention... (46NT)
- Those bindings should be made to the tools and resources the SMEs are already using to do their job. Any necessary ontologies should *help* the SMEs to do their work better and faster. (46NU)
- Why is it that ontological approaches are not taken when they could/should be? (44EN)
- There are a number of factors. (46NV)
- Sometimes, the long pay-off time makes these interventions either riskier, or outside the expected pay-off for the decision-maker, and hence less attractive. (46NW)
- Secondly, while the Semantic Web understanding of ontologies is useful for certain classes of applications, it is not well suited to many other applications. This can make it difficult to communicate the potential of ontologies, especially so if a culture has been "indoctrinated" with the SemWeb understanding of ontologies - in these cases, it is an uphill battle to get them to realize the value of a broader understanding of what ontologies can do. The recent mini-series on Rules, Reasoning and LP demonstrated this disconnect well, whereby one community differentiates between axioms and rules, while the other community does not restrict itself to considering "pure axioms" vs "rules". (46NX)
- Thirdly, many interventions I've seen don't fully take into account the sociological factors of the solution - without a cogent understanding of the culture in which the technology intervention is taking place, there are many opportunities for misaligned expectations, yielding in gaps in implementation or improperly used technologies. (46NY)
- Fourthly, a broad class of potential ontology based applications can be achieved with a non-ontological approach faster and cheaper. We assert that the long term value proposition in this instance is lower than a "proper" solution, but demonstrating and clearly communicating the opportunity cost can be difficult. (46NZ)
- Lastly, and this complements the previous point, there is a dearth of popular / well-known successful ontology-based solutions. Whereas the benefit of say a CRM or a DB is well known, in many instances, those involved in ontology need to reiterate the value proposition nearly from scratch. (46O0)
- Time constraint on the delivery of the ontological artifacts mean that the model and its implementation are generally not separated. (46O1)
- We are essentially dealing with migrating content from XML to a world where the true semantics of the underlying content can be expressed. When started our investigation, standards such as FRBR [1] were not very well defined, while ontologies (such as DoCO [2]) were not yet available. As a result, we created our (proprietary) own model to represent content and still looking at determining the best approach to document the ontology. We have tried using a simple graphical notification, UML, and HTML (generated using tools such as LODE [3]). Sadly, there is no tools that can provide the info for different type of users without having to spend hours on the documentation (which is a huge bottleneck at present). (46O2)
- In terms of natural language tools, I investigated the use of controlled natural language to express ontological knowledge, but failed to find an approach to easily express ontological axioms. I have also tried to use Fluent Editor [4], but I found it rather counter-intuitive and the import was problematic for highly modular ontologies (such as the one we have developed). [1] http://www.ifla.org/publications/functional-requirements-for-bibliographic-records [2] http://purl.org/spar/doco [3] http://www.essepuntato.it/lode (46O3)
- Current ontological approaches are too primitive. (46O4)
- Challenges people to say what they mean and mean what they say. Seen as personal threat rather than fit for purpose. (46O5)
- A basic reason is that no common language for technical literacy has been developed or even seriously considered as a possibility. On 7 leading edges currently used languages are deficient compared with a design distributed at IBM in 1973. See "Inexcusable Complexity for 40 years" on my web site. (46O6)
- Lack of knowledge about the importance of an overarching data model and the role that semantics plays in defining and offering a consistent interpretation of shared data among applications and services across the enterprise and across systems.. (46O7)
- I have used natural language tools that can semi-automatically create an ontology from text sources, but refining the results of tens of thousands of concepts into a consistent model is hard. An ontology by its nature has some logical formalism that enables logical reasoning, but for this to work the ontology has to satisfy these constraints which NLP tools are deficient in. Also the extraction relies on the provenance of the text sources so garbage in garbage out again. (46O8)
- > There are a number of factors. Sometimes, the long pay-off time makes these interventions either riskier, or outside the expected pay-off for the decision-maker, and hence less attractive... (46O9)
- I agree. But those are symptoms of not having the right tools. (46OA)
- > while the Semantic Web understanding of ontologies is useful for certain classes of applications, it is not well suited to many other applications... many interventions I've seen don't fully take into account the sociological factors of the solution...a broad class of potential ontology based applications can be achieved with a non-ontological approach faster and cheaper. (46OB)
- More symptoms of inadequate tools. (46OC)
- Fundamental principle: Ontology tools should *reduce* the expense by enabling SMEs to accomplish more in less time. The ontologies should be a *by-product* of the SMEs' normal work. (46OD)
- Recommendation: The ontology summit should devote more attention to cutting-edge research than to incremental improvements on inadequate tools. Some suggestions: (46OG)
- See the slides and publications by the Aristo Project at AI2: http://www.allenai.org/TemplateGeneric.aspx?contentId=12 (46OH)
- The IBM Watson project is also doing research on deriving knowledge from the same kinds of resources as AI2. (46OI)
- Tom Mitchell at Carnegie Mellon developed the Never-Ending Language Learner (NELL): http://rtw.ml.cmu.edu/rtw/index.php . Or see http://wamc.org/post/dr-tom-mitchell-carnegie-mellon-university-language-learning-computer (46OJ)
- For the past few years, I've mentioned Cyc as an important project that is doing important research with the world's largest formal ontology. (46OK)
- And from time to time, I cite the VivoMind work. For example, http://www.jfsowa.com/talks/goal7.pdf (46OE)
- I won't claim that these projects will solve all the problems tomorrow. But I believe that tools based on some combination of these methods will solve the problems raised by Matthew's questions. They'll get better results faster than trying to "educate" developers about ontology. (46OF)
- There are a number of factors. (46NV)
- Ref the questions asked for the chat during the Track C session one: (46OR)
- How to arrive at reusable patterns? (46OS)
- How many patterns are there? (46OT)
- Are there types of patterns? (46OU)
- Are all patterns domain-independent? (46OV)
- Can we mine patterns from data? (46P1)
- [11:06] MatthewWest: I'll try to answer the 1st question. There are an unlimited number of patterns, because there are an unlimited number of atomic elements. Some patterns at least are domain dependent. Yes we can mine patterns from data, in fact this is one of the best ways to develop patterns. (46P2)
- [11:06] ToddSchneider: Without trying to be facetious, what is a pattern? How can one be identified? (46P3)
- [11:19] MikeBennett: (Summarizing @Aldo's verbal remarks) Complete repository of archetypical patterns... (primitives). Questions as to whether this is feasible. My take on this: maybe feasible in a simpler domain like business, more challenging if pursuing notion of archetypes for all human experience (per Leibniz etc.). I'm hearing confidence that the former at least can be done :) See also DOLCE. (46P4)
- [11:20] AldoGangemi: @Mike good summary (46P5)
- [11:21] MikeBennett: @Aldo Thanks - glad I captured it OK. This is something I am very motivated about. (46OL)
- Who will develop and maintain these patterns? (46OW)
- Are there measures or at least experience reports on the robustness and usefulness of patterns? (46OX)
- Are there success stories of large-scale pattern usage? (46OM)
- [11:09] MatthewWest: 2nd Question. For true patterns, they will mostly be discovered, rather than invented. In the end, standards organizations will curate them. Good patterns are always useful, because they save effort and improve quality. The more a pattern is used the better it gets as bugs are eliminated. (46P6)
- [11:14] KarlHammar: Agree with Matthews answer to 2nd question above. ChrisWelty did a keynote at Workshop on Ontology Patterns at ISWC 2010, touching upon exactly this. He called it "pattern archeology", i.e. the "digging up" of patterns from established systems/practices/models/etc. A process of discovery as opposed to design. Perhaps the keynote is available in WOP proceedings. (46P7)
- [11:10] KrzysztofJanowicz: Kuhn's vision statement 'Modeling vs Encoding for the Semantic Web': http://www.semantic-web-journal.net/sites/default/files/swj35.pdf (46ON)
- How to abstract from individual ontology designs? (46OY)
- Do we need higher-level ontology modeling languages on top of knowledge representation languages? (46OZ)
- How to get community buy-in? (46OO)
- [11:12] MatthewWest: 3rd Question: when you abstract from ontology designs you are usually moving up the subtype/supertype hierarchy rather than moving out class-instance, so you should not normally need another language. Buy in comes from utility plus ease of availability and use. (46OP)
- How important is the selection of specific language constructs for the scalability and reuse of patterns? (46P0)
- It is first of all important that the language constructs can support the requirements of the application, otherwise all is lost. However, there is often a way to restate things that is more efficient from a processing perspective, and this can have obvious processing benefits, but may make the resulting ontology more opaque. Generating more efficient language forms from more understandable forms may be a way forward. (46OQ)
- How to arrive at reusable patterns? (46OS)