OntologySummit2014: (Track-B) "Making use of Ontologies: Tools, Services, and Techniques" Synthesis    (43DS)

Track Co-champions: ChristophLange, AlanRector    (43DT)

Mission Statement:    (43DU)

The Web of Data provides great opportunities for ontology-based services, but also puts challenges to tools for editing and using ontologies, and to techniques for ontology engineering.    (43MX)

Services such as information retrieval, question answering, or planning can draw on a much larger pool of knowledge if they make use not just of a few static ontologies but tap into the whole Web, including    (43N3)

The larger the amount and the denser the interlinking of data, the higher is the chance to find up-to-date information that matches the user's expectations. However, there is an increased risk that data is messy or incomplete and contains erroneous, irrelevant or contradicting information, which requires pre-processing or filtering before one can provide a high-quality service.    (43MW)

Big Data, be it big ontologies or big datasets as instances of ontologies, requires scalable tools. Not only do we require tractable reasoning, but the provision of ontology editing, browsing and visualization is also challenged once ontologies or their instance data become larger than one computer's main memory.    (43MU)

Traditional ontology engineering techniques including heavyweight ontologies and methodologies may not be easily applicable or optimum in a Big Data setting. It may be hard to build a new ontology from data, if this data is spread all over the Web and incomplete or messy. Finding and using relevant parts of large ontologies can also be challenging. It may also be challenging to refactor an ontology: if instance data is being added continuously, its production cannot be suspended to accommodate for changes to its schema.    (4BXL)

Ontological applications on the (big) Web of Data can profit from use of lightweight ontology and methods, which can provides focused ontological commitment that affords linkage to complex descriptions.    (4BXM)

These are the questions we would like to answer in this track.    (43N8)

Track Plan and Deliverables    (43YX)

1. Discuss the mission statement of the track.    (43YY)

2. Make suggestions about how ontology tools and techniques could be scaled up to the Semantic Web (by which we don't mean small OWL ontologies but really the Web of Data as a whole) and Big Data, especially open problems in this direction. Other tracks will consider similar issues. The focus in this track will be on techniques and on tools (and on the services that they enable).    (43YZ)

3. Make suggestions about how ontology-based services could benefit from tapping into the Semantic Web and Big Data, or how, vice versa, access and management of Big Data can be improved or even enabled using ontologies.    (43YW)

4. Find at least one example of an ontology tool or technique for which we can propose how to scale it up to the Semantic Web or Big Data. This could potentially be a Hackathon project.    (43Z0)

5. Develop the write-up (a section) that synthesizes the results of the discourse in this track as a contribution to the OntologySummit2014.Communique.    (43Z1)

see also: OntologySummit2014_Ontology_Tools_Services_Techniques_CommunityInput    (43DW)

Draft Synthesis    (46S3)

The Web of Data provides great opportunities for ontology-based services, but also puts challenges to tools for editing and using ontologies, and to techniques for ontological reasoning and ontology engineering.    (46S4)

Here, we use the term “Web of Data” to subsume both Big Data and the Semantic Web. The “big” in Big Data is commonly defined as big, and therefore challenging, in one or more of volume, velocity and variety (cf. OntologySummit2014_Tackling_Variety_In_BigData_Synthesis). We use “Semantic Web” with a particular emphasis on Web, i.e. making sense of knowledge distributed over the Web. This is in contrast to, say, using a local OWL reasoner on a small ontologies, where the only “Web” aspects are using IRIs as symbol names, and employing inference rules based on an open world assumption.    (46S5)

Services can draw on a much larger pool of knowledge if they make use not just of a few static ontologies but tap into the whole Web – but are ontologies relevant for this? IBM Watson answers rich natural language questions over a broad domain of knowledge, giving precise answers with an accurate assessment of confidence and consumable justifications within seconds (cf. http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_01_30#nid455F). The Watson developers did not build a formal ontology of the World, with which they would try to unify formal logical representations of the questions. Instead, they locally learned ontologies on demand, drawing on formal as well as informal sources, using different reasoning techniques. First, hypotheses are generated. Secondly, evidence is retrieved for them; approaches include keyword matching against as-is natural language text sources. The challenge is to disambiguate types (e.g. “person” vs. “place”) of entities and predicates. This is partly solved using existing taxonomies such as YAGO. Finally, the evidence is scored, largely based on machine learning, i.e. statistical techniques.    (4BDB)

Indeed it is unlikely that we will be able to make Web-wide ontological commitments. Where Watson limits itself to a few simple taxonomies, other large collaboration efforts may agree on a limited subset of ontologies, such as parts of some molecular biology ontologies, the Gene Ontology and other OBO Foundry ontologies. It is possible to create ontologies from big data, but it is hard. Manually building ontologies is labour intensive, mining data for reusable information suffers from the potential inconsistency, incompleteness and irrelevance of data “out there”, and machine learning may require further research for being applied to learning ontologies from big data. It will be interesting to see what combination of ontology engineering and reasoning techniques will be used for big data problems. Watson is currently making first steps beyond playing Jeopardy, into health care. Finally, should one even try to represent big amounts of knowledge using ontologies? Do even light-weight ontologies scale to big data? Or would it rather suffice, as use cases in biology suggest, to use ontologies for annotating big data with terms?    (4BDC)

A swing back to lightweight approaches has also occurred in the field of web services. Generally, a service consumer finds a web service that a service provider has registered in a central registry, and then communicates with the web service in order to execute it. Semantic web service descriptions, in addition to the basic syntactic WSDL description, is required for finding and comparing service providers, for negotiating and contracting services, for composing, enacting and monitoring them, and for mediating heterogeneous data formats, protocols and processes. Traditionally, the semantics of web services would have been described using heavyweight ontologies such as WSMO or OWL-S based on expressive ontology languages, and these services would have been assumed to communicate by heavyweight XML messages according to SOAP. As the semantics-first modeling approach promoted by WSMO or OWL-S was not taken up in practice, the more recent linked services initiative (http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_03_13#nid48JA) now promotes a bottom-up annotation and interlinking approach with more lightweight RDF(S) based ontologies for service description, and it faces the reality that the majority of web services is implemented using lightweight REST interfaces.    (4BDG)

RDF, the native language of linked data, goes a long way in big data settings, because of the low ontological commitment it enforces, while still allowing to link to complex descriptions. For example, the Open Semantic Framework (OSF, http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_03_13#nid48J9) integrates different state-of-the-art engines and web services partly but not exclusively based on ontologies, and a Drupal-based content management system, on top of a single, internal, canonical data model using RDF and some OWL, which allows for representing structured, semi-structured, even unstructured data, and gives reasoners access to these data. This integration leads to a platform that translates underlying data structures into widgets that power complex interactive web applications, in which ontologies inform interface displays, define component behaviors, guide visualization template selection and content filtering, and help to navigate and organize web portals.    (4BDH)

Building lightweight ontologies – one often speaks of vocabularies instead – requires new, agile engineering techniques. The recent Linked Open Terms (LOT) approach starts with reuse, taking advantage of the great number of vocabularies that already exist on the Web (http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_03_13#nid48JC). Where the terms needed to describe the data at hand cannot be found in existing vocabularies, the knowledge engineer will have to create new ones, but is encouraged to link them to existing ones. Both the reuse of existing vocabularies and the development of one's own vocabularies are continuously accompanied by evaluation tools such as the ontology pitfall scanner OOPS!.    (4BDI)

Regarding reasoning techniques, OWL may still be suitable in today's applications. In the OSF it allows for duplicate names and incomplete information (thanks to its open world assumption) and proves extensible to multiple schemas. Still, the variety and heterogeneity of types of information, of schemas and of software on the wider Web of Data may challenge reuse of ontologies, tools and techniques. As most tools and techniques date back to hand building of small ontologies for specific applications, they are limited to a single, or a few, formalisms, often OWL and RDF(S), which leads to a reuse-unfriendly situation similar to the well-known siloing of knowledge.    (4BDJ)

As we approach big data, it is worth taking a step back and reflecting whether all information we are dealing with is ontology (cf. http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_01_30#nid455G). Should ontologies cover everything there is, or rather everything we know about what there is? Even in the latter case, traditional ontology languages assume that there is such a thing as universal knowledge (even though it may not be fully stated in the current ontology: other parts of it may be stated somewhere else in the open world), whereas in the real world knowledge is often contingent, accidental or particular. Should we have the same representation for all knowledge, or should our architectures partition the knowledge formally? Concretely, OWL is good for expressing universal knowledge in an open world, whereas template formalisms such as frames, UML or rules are good for expressing contingent knowledge. (“Template”, here, refers to a pattern for a data structure that specifies the fields/slots/attributes that may be used to hold information about concrete elements in the domain, see https://www.escholar.manchester.ac.uk/api/datastream?publicationPid=uk-ac-man-scw:201196&datastreamId=FULL-TEXT.PDF .) Despite the above-mentioned OntoIOp initiative it seems that the relations amongst different formalisms are not yet well understood. Description logics, as in OWL, focus on intensional descriptions with a model theoretic semantics. The practical usage of RDF(S) and its query language SPARQL (which may be used to express rules) outnumbers the usage of OWL by far (as was mentioned by DavidPrice), but its users are often ignorant of its formal semantics. It does, however, cope reasonably well with heterogeneous data, and therefore is the basis of many big data systems. UML and relational database schemas are well understood, widely used for knowledge representation, and have good visualisation tools, but how do they interact with other formalisms? UML was not originally intended to be formalised. Still, many possible formal semantics have been devised for it, and OntoIOp adopts one of them. Finally, rules are a good fit for many practical problems, e.g. in business, but their standardisation is lagging behind the standardisation of ontology languages.    (4BDD)

One approach for further breaking the boundaries of tools is to use tools and their underlying ontology languages more creatively. OWL is widely supported by tools that work (such as the OSF introduced above), which makes it attractive and usable (AndreaWesterinen). Template formalisms are better suited for expressing contingent knowledge – no problem, one can at least take inspiration from them in using OWL! (cf. http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_01_30#nid455G) The emerging tools in the ecosystem created by the OntoIOp standardization effort (cf. http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_01_30#nid455E) integrate many universal-knowledge languages having an axiomatic semantics, but there are also first steps towards frames, UML and rules. OntoIOp addresses the variety problem with the Distributed Ontology, Service and Modeling Language, a meta language for ontology and data languages. The Hets and Ontohub tools developed in this context support alignments and reasoning across ontology languages. While not yet “big” w.r.t. the volume of supported ontologies, the mechanisms that these tools support for representing ontologies in a modular fashion, with different modules written in different ontology languages, and for distributing modules over the Web, prepare future work towards Big Data on the Web. Other challenges on the way there have been mastered already, such as retrofitting linked data conformance into languages that have so far been designed for being used on a single system, but it is still a long way towards an adoption of universal URIs as identifiers of concepts regardless of the concrete ontology language they are formalized in.    (46SC)

Major remaining problems of tools include visualisation and scalability of reasoners, which, however, has improved by orders of magnitude over the past 5 years.    (4BDE)

At the end of this Ontology Summit, a concern remains, which may well prepare us for the next Ontology Summit: the requirements for ontology-based tools, services and techniques in a big data world are not yet clear. Until somebody presents the "killer application", people do not know they need it. There are few human factors studies; an recent study on "Design Insights for the Next Wave Ontology Authoring Tools" (http://robertdavidstevens.wordpress.com/2014/01/31/issues-in-authoring-ontologies/) by the University of Manchester is an exception . For now it seems one needs to reflect on the few existing exemplars and ask the question of what would it have taken to build this more efficiently and effectively.    (4BDF)


 --
 maintained by the Track co-champions ... please do not edit    (43DX)