Difference (from revision 7 to 8)

Changed: 33c33

The principal goal of the summit is to bring together and foster collaboration between the ontology community, systems community, and stakeholders in "big systems." {nid 36IU}
The principal goal of the summit is to bring together and foster collaboration between the ontology community, systems community, and stakeholders in "big systems." {nid 39JJ}

Changed: 35c35

We will aim towards producing a series of recommendations describing how ontologies can create an impact; as well as providing illustrations where these techniques have been, or could be, applied in domains such as bioinformatics, electronic health records, intelligence, the smart electrical grid, manufacturing and supply chains, earth and environmental, e-science, cyberphysical systems and e-government. {nid 36IE}
We will aim towards producing a series of recommendations describing how ontologies can create an impact; as well as providing illustrations where these techniques have been, or could be, applied in domains such as bioinformatics, electronic health records, intelligence, the smart electrical grid, manufacturing and supply chains, earth and environmental, e-science, cyberphysical systems and e-government. {nid 39JK}

Added: 46a47,60

The common thread of this summit for big systems is models and modeling and the need to have models with greater fidelity and interoperability. The primary driver for a modeling approach to systems engineering and development is simply cost in time and money, and resultant system value. {nid 39JL}

Among the current approaches to mitigate some of the cost factors associated with engineering are executable architectures and model based engineering. Each approach involves a model to either understand the thing being designed or to provide a predictive base of design. In each case current methodologies and tools fail to provide {nid 39JM}

* sufficient rigor in their ability to adequately represent the system for the needs of the entire engineering lifecycle and its environment, {nid 39JN}
* explicit semantics, leaving those in the minds of the modelers, {nid 39JO}
* the use of logical inferencing to automate processes. {nid 39JP}

The lack of adequate fidelity of models, their conceptualizations, and consistent semantics during engineering phases can incur poor design, mis-communication across the lifecycle and among stakeholders, implementation errors, re-work, and systems that fail to meet their expected uses nor cost-effectively be extended to meet unanticipated needs. During operation such systems may be difficult to maintain, including simple maintenance, updating, or even extensions. Moreover, there is a growing expectation for systems to be more 'intelligent'. To be able to adapt, or at a minimum be adaptable, to new needs without incurring large costs. {nid 39JQ}

The world has been in a 'Cambrian' age of information explosion for the last decade but this is in transition to a new era of knowledge. The information age has resulted in the production of unprecedented amounts of data and information - Big Data. Accompanying this abundance of data and information are Big Systems that attempt to handle it and provide ‘knowledge’. {nid 39JR}

Finally, as we move into the knowledge age there is a growing expectation that our systems will be more self-describing and intelligent. In order to engineer such systems, allow intuitive use and meet expectations of all stakeholders, a more consistent and complete use of ontologies and ontological analysis must be made. {nid 39JS}

Added: 47a62,208


Over the past 100 years, humans have entered the ‘Cambrian’ age for information, knowledge and systems. The amount of information and knowledge produced, published and shared across the globe has been growing exponentially each year. In the past decade alone, more data has been collected, more video has been produced and more information has been published than in all of previous human history. At the same time, with the advent of the computer, digital representations, and the Internet, it has been possible to model more of the complexity of systems, connect more people and connect more (information) systems. With all this new information (aka Big Data) and all these new systems (aka Big Systems), there has also be an attendant growth in the complexity of systems that model physical phenomena and handle information, their size, their scale, their scope and their interdependence. {nid 39JT}

To address the problems that have arisen during the Cambrian period of information and knowledge, we need novel tools and approaches. Some of the major challenges facing Big Systems stem not only from their scale, but also their scope and complexity. At the same time, there are novel challenges for Big Systems when different, dispersed groups work together toward a common goal, for instance in understanding Climate Change. This leads to a need for better solutions for interoperability among federated systems and for fostering interdisciplinary collaboration. {nid 39JU}

Given the broad scope of this year’s theme, Ontology for Big Systems, the summit was organized along three tracks and two cross-track initiatives. This communique seeks to distill and construct a whole from the activities that occurred within each track and throughout the summit. The interested reader is encouraged to visit the synthesis and community pages for further information. In addition each of the meeting pages, containing links to the presentations, audio recordings, and chat sessions is also available for review. The tracks were as follows: {nid 39JV}

* Big Systems Engineering {nid 39JW}
* Big Data Challenge {nid 39JX}
* Large Scale Domain Applications {nid 39JY}
* Quality Cross Track {nid 39JZ}
* Federation and Integration of Systems {nid 39K0}

=== Big Systems Engineering {nid 39K1} ===

Engineers and designers have always used a variety of models as part of their disciplines. Designing a car, a power plant, or a transportation system relies heavily on creating a model of the system. Similarly, models are used extensively in trying to understand how complex systems such as the body or climate works. These models express a theory of, or a part of, the world. In the computing age, it has become far easier to create and share these models. {nid 39K2}

Different fields deploy models of varying sophistication, though in many, the semantics - the meaning - of the model and its parts are implicit or governed by inconsistent convention. But the promise of model reuse, a desired goal, is hindered by differences in semantics. So a gradual shift to explicit semantics is underway, first in engineering and slowly in other fields. {nid 39K3}

The various disciplines within engineering are evolving from using informal modeling, to using formal languages to model their systems, to underpinning said languages with explicit semantics, to recognizing the importance of understanding the underlying ontology of the elements of the languages. {nid 39K4}

There are various standardization efforts underway to advance the semantic and ontological foundations, from the development of ISO 15926, to providing formal semantics for the Unified Modeling Language. Similarly, groups are working to build repositories of ontologies, or libraries of ontology patterns - snippets that formalize important aspects of reality such as “part-of” or “is-a”. {nid 39K5}

=== Big Data and Applications {nid 39K6} ===

A key component of the current explosion of information is the proliferation of vast amounts of data. With greater computing power there is an easy ability to create and track data. Whether it be encoding an organisms DNA, tracking Internet usage, tracking credit usage, the experiments at the Large Hadron Collider or weather satellite data, each of these activities creates a staggering amount of data. {nid 39K7}

While the sheer size and scale of these data sets presents its own challenge, knowing how to first understand the data, garnering information and knowledge from it, and then intelligently combine it with other data sets means that there is a need to accurately represent (the portion of) the world this data represents. This in turn necessitates each data source adequately represents itself and what was intended by the publication of the data. {nid 39K8}

Moreover, If we want to combine data from multiple sources, then it becomes all the more important that we understand what each source intended by the publication of the data. {nid 39K9}

To do this, we need theory. There are limits to blind statistical analysis. We need theory and statistical analysis together. Data publishers need to make explicit what their data represents, the systems that consume and transform To intelligently use this data and combine it for useful ends, involves developing theories about those relevant parts of the world. Especially if we want successful data reuse and adaptability. {nid 39KA}

There are a variety of groups working towards this vision. For example, the linked open data (URL) seeks to connect distributed data across the net. While there are many data sources available online today, that data is not readily accessible. The LOD cloud aims to create the requisite infrastructure to enable people to seamlessly build “mash-ups” by coming data from multiple sources. {nid 39KB}

Similarly, there has been a surge of work in bioinformatics, including the Open Biological and Biomedical Ontology, Gene Ontology and other sources which annotate big data with explicit semantics. These initiatives allow research groups to publish findings on genes, gene expression, proteins and so in a standardized consistent manner. {nid 39KC}

Another example is the FuturICT project funded by the European Union. Its ultimate goal is to understand and manage complex, global, socially interactive systems, with a focus on sustainability and resilience. FuturICT will build a Living Earth Platform, a simulation, visualization and participation platform to support decision-making of policy-makers, business people and citizens. {nid 39KD}

=== Interoperability {nid 39KE} ===

The Internet means that it is far easier for different people in the different parts of the world to share and combine data, information and knowledge. If we want to realize the true potential of this interconnected world it means that we need to be able to combine not just our data, but also our models. {nid 39KF}

An initiative like Sage Bionetworks might allow a doctor in China to integrate diverse molecular mega-datasets, and reuse a predictive bionetworks built by a team in United States that deploys new insights into human disease biology by a team in France. Each different community views and prioritizes parts of the world according to their own viewpoints and interests.
Similarly, within a single enterprise, the same product may be viewed differently by each of the marketing, engineering, manufacturing, sales and accounting departments. Making sure that these views are, if not harmonized, then aligned so that information can be successfully shared entails solving interoperability. {nid 39KG}

Semantic analysis is a fundamental, essential aspect of federation and integration. Building value by combining the views of different communities means solving interoperability, and that means negotiating the implicit meaning used by each of these groups. {nid 39KH}

The Object Modeling Group has recently put out a request for proposal to create a standard to address such issues. Similarly, within the systems engineering community, one example is the ISO 15926 standard which aims to federate CAD/CAM/PLM systems in industry, business and eco-system-wide scales. {nid 39KI}

=== Interdisciplinary Collaboration {nid 39KJ} ===

Similarly, as knowledge has become more specialized, different communities have developed their own bodies of knowledge. Bridging these gaps can unleash a lot of potential, foster innovation, reduce the reinvention of the wheel and accelerate the development of better tools. {nid 39KK}

While each specialization may use its own jargon and technical language, the underlying reality is the same. Ontologies, in the form of explicit statement of the assumptions in each sub-field can help identify points of overlap and interest between different communities. They can serve as tools to facilitate search and discovery.
The Linked Science effort is a project that aims to create an “executable paper.” It hopes to combine publication of scientific data, metadata, results, and provenance information using Linked Data principles, alongside open source and web-based environments for executing, validating and exploring research, using Cloud Computing for efficient and distributed computing and deploying Creative Commons for its legal infrastructure. {nid 39KL}

Another project, the iPlant Collaborative, is building the requisite cyberinfrastructure to help cross-disciplinary, community-driven groups publish and share information, build models and aid in search. The vision is to develop a cyberinfrastructure that is accessible to all levels of expertise, ranging from students to traditional biology researchers and computational biology experts. {nid 39KM}

== State of the Practice vs State of the Art {nid 39KN} ==

Most aspects of engineering involve models, many times residing solely in the engineer’s mind. In the process of engineering big systems there are many models, and possibly quite complex models, developed by different disciplines, different teams and different people which may be geographically and culturally dispersed. But models from different disciplines have different levels of expressivity or fidelity, different degrees of automation, and are not interoperable in general. Aside from differences in tools and modeling syntax, more fundamentally, different and not necessarily compatible conceptualizations and interpretations usually arise. At various points in the system's development and operational lifecycle(s) these differences must be resolved and models integrated, or at a minimum, differences bridged, to achieve interoperability, including syntactic, conceptual, and semantic, in order for collaboration and continued development to occur. These efforts to resolve incompatibilities add additional time and costs. {nid 39KO}

To mediate at least the possible semantic differences among models there has been a progression in engineering to shift from informal modeling toward more explicit semantics, for instance chalk/white board sketches or textual descriptions, to modeling in formal languages that support more explicit and complete semantics. However, beyond the issues of semantic differences of models, there can be, and are, differences in conceptualizations. These differences may not always be readily apparent and sometimes manifest in modeling languages. {nid 39KP}

The modeling of big or complex systems (note that “system” can also refer to big data sets) requires conceptualizations within multiple domains of relevance to the system(s), their use(s), and engineering processes. Ontologies represent conceptualizations of aspects of a domain or environment. While ontological analysis provides a more thorough analysis methodology for understanding and distinguishing the complexity of big systems. Modeling, in all its various guises, is an area where ontology and ontological analysis is starting to be used and has great potential. {nid 39KQ}

Ontologies can be viewed as patterns for what constitutes a system with parts and connections, the identity, dependence, unity of systems irrespective of their particular nature. Informally a system is an entity that consists of components, where the components are connected in some way such that the system as a whole exhibits some behavior. For engineered systems, it is usual for them to be designed such that the components are replaceable. Key relations like classification, specialization, and whole-part are well understood in the realm of ontology, and see major application in systems engineering. Computer based modeling languages provide some built-in support for component modeling and provide facilities for extending the language’s ontological commitment, but are usually not sufficient to support formal semantics, logical inferencing, nor expressive enough to take advantage of rigorous ontological analysis. {nid 39KR}

For the class of big data, its attendant challenges include those similar to engineering in general, differences in conceptualizations and differences in semantics in addition there are differences in terminologies: different data/information models. Aside from these challenges, most big data resides in systems that were engineered for specific purposes and were embedded with implicit assumptions and semantics. In order to make the most effective use and reuse of such data/information the conceptualizations and semantics must be made explicit and (machine) accessible. Doing so will allow for the automation of the discovery of new relationships and knowledge. {nid 39KS}

The class of big systems derives from the federation of systems. There are multiple dimensions to the federation of systems - hardware, software, organizations and people. Challenges in the application of ontology for these systems includes modeling which requires a broad collection of conceptualizations and semantics spanning the federates, the ability to reuse data and artifacts from one life-cycle stage in later life-cycle stages, and integrating the models and their artifacts using multiple modeling languages. Thus if different systems include their own ontologies, they too must be federated – federated ontologies for federated systems. This is thus a requirement for ontologies. That they can behave as components. That they can be assembled to contribute to an ontology of the whole. Yet in general ontology developments are one-offs with it being rare for ontologies to be reused or be reusable. For ontology to be useful for engineering ontologies must be developed so that they can be reusable. {nid 39KT}

With the advent of standards for the semantic web and supporting tools it has become easy for most people to create ontologies. Unfortunately most people have not had training or even exposure to ontological analysis, the result being that myriad incompatible ontologies are being developed in ad hoc ways leading to the creation of new semantic silos.
Many sources of ontologies that abound today come from software development. Here ontologies are viewed as more sophisticated data models (which of course they are, but not just that). And, as with the early age of software development, there are few, if any, engineering practices and processes for ontology development or ontological engineering. {nid 39KU}

A hallmark of systems engineering, distinguishing it from less rigorous systems creation activities and essential to success in developing large-scale and complex systems and managing them throughout their life-cycles, is the rigorous use of requirement specifications, requirements-centric architecture and design, multi-stage testing and revision, and other risk-management and quality assurance techniques. Quality at any of these levels is defined in terms of the degree to which any one of the system, component, process, etc., meets the specified requirements. Analysis and specification of requirements and functions at each of these levels, along with identification and application of relevant quality measures, is an essential part of good systems engineering. In order for the potential of ontology to be realized in engineering, and especially for its application to complex or big systems and big data, these same practices need to be applied. {nid 39KV}

== Recommendations {nid 39KW} ==

This section represents a distillation of the discussion in this year’s summit focused on recommendations. The intersection between ontology and big systems and big data spans many communities, disciplines, and levels of depth. Regardless of the community, the success of any ontology intervention requires understanding its intended environment and problem space to be addressed. Clarifying how ontology fits into the larger picture will shape what level of expressiveness and semantics are required and how they may be employed in a project - Not all ontologies need to be reasoned over and rarely are they the end product.. {nid 39KX}

In considering the use of ontology one has to gauge the level of “semantic maturity” of the organization and environment in which the use is proposed. To what degree does the broader organization understand ontology or the application of ontology? To what extent are such technologies already being deployed? Will the shift be incremental or might it be perceived as disruptive? {nid 39KY}

=== Ontologies and Engineering {nid 39KZ} ===

From the Systems Engineering community, there was a strong emphasis on the importance of modeling, and explicating the underlying concepts and their semantics. A number of candidate modeling languages were considered alongside their deficiencies in semantic and conceptual clarity (ontology representation languages among those). It was further noted that developing an ontology of a problem space or domain as a referent conceptual model allows an organization to decouple this knowledge from any particular information model or technology implementation. In this way, a technology agnosticism is enabled, allowing the conceptual model to be reused and realized in whichever technology stack is most appropriate. There are many engineering and enterprise tasks where ontology is definitely applicable and would provide great value, but not yet in wide use. {nid 39L0}

=== Ontologies and Their Infrastructure {nid 39L1} ===

Systems engineering is all about understanding the whole and the relationships between the parts. It involves assembly from components and support for the use of the same parts in different systems. This calls for ontologies which can themselves be components of other ontologies and be assembled for an ontology of the whole system. Yet in general ontology developments are one-off with it being rare for ontologies to be reused or be reusable. For ontology to be useful for engineering reusable ontologies to support reusable engineering models will be important. {nid 39L2}

Big systems have a long life and usually change over that life. They tend to interact with their environment and change state as a result of interaction. This means conceptualizations are needed to model state change and system evolution throughout its lifecycle which in turn means that the ontologies that describe a system need to be able to change, but in a way that means that the history of changes is not lost. This requires a sophisticated approach to change and configuration management, both in model and ontology creation and maintenance. {nid 39L3}

When deciding what ontologies to use or implement, there is a consensus that where possible, ontologies should be reused from pre-existing sources. Two such sources were explored, Ontology Repositories with full-on ontologies , or as ontology patterns that represent successful representations of particular relations or “snippets” of a domain. The former have the advantage of providing a more comprehensive solution, while the latter afford greater flexibility and in theory, allow the designer to pick and choose among a variety of patterns to best meet their needs. {nid 39L4}

Foundational ontologies contain conceptualizations needed for modeling, especially at the enterprise scale. These include processes, events, descriptions, plans, physical quantities, individuals, types etc. Further ontologies provide relationships between the concepts which can be exploited to relate data needed to determine program status. Some enterprises have recognized that ontologies generalize information models and provide better access and organization than traditional data models. {nid 39L5}

=== Ontology and Engineering Practice {nid 39L6} ===

Determining exactly which ontology is appropriate for an application is an involved task and requires a number of judgments in terms of the desired expressivity, comprehensiveness and breadth. To this end, it was recognized that a number of distinct problems are often conflated. It is wise to disentangle:
1. The level of expressiveness (representation) needed for your domain. This is development time expressiveness.
2. The level of expressiveness (representation) it takes to efficiently reason over the ontology at run-time. This is run time expressiveness
3. Transformation of the representation of (1) to (2), i.e., knowledge compilation. {nid 39LD}

Not enough expressivity may mean that it is not possible or cumbersome to represent an essential aspect of your problem space. Conversely, allowing extraneous expressivity for reasoning can severely affect run time performance. A vital task for any ontology implementation is to understand the level of expressivity as required by the problem space, while also accounting for performance criteria. One observation was that ontologies work best when not compromised by implementation tradeoffs. {nid 39LE}

This means that greater work is required to build adequate support frameworks for such tasks, which is currently minimal. When it comes to the deployment or construction of an ontology, while the target community should be included in the development and evolution of the vocabularies, engineers turned ontologists often don’t have the necessary background or skills. The transition from implicit domain knowledge to explicit encoding requires community consensus, which in turn requires an organizational commitment to create the necessary infrastructure to manage such consensus. {nid 39LF}

In those applications where the ontology will impact end users, there is broad consensus that the nitty gritty details of the ontology be hidden. At most, end users should be exposed to a something like simple knowledge organization system (SKOS) while the more complicated constructs be deployed only on the back end . For example, on one successful project ontologies were used as configuration templates which user interface specialists then used to tailor views for their end user. {nid 39LG}

It’s also been observed that the proliferation of ontologies has not been accompanied by adequate tools or methodologies to gauge the quality of the ontologies. Quality dimensions/criteria/attributes and measures vary with the specific project at hand. We currently do not have a clear understanding and virtually no documentation as to how that variation works. Experienced ontologists develop a sense of this, but it is implicit and not made accessible to others. {nid 39LH}

Are they fit for purpose? Any ontology project should not only pay attention to quality, but develop a quality policy. How would the organization measure the success of the ontology project? While there currently exists no standard methodology, there are some efforts within the literature. Consequently, a more systematic effort is needed. Concurrently, it is important to spread the understanding that ontologies need to be viewed as technical artifacts that need requirements and quality assurance. {nid 39LI}

In general, {nid 39LJ}

* Expose users to SKOS semantics; use more complicated constructs only on back end if necessary. {nid 39LK}
* Look for the 80-20 rule of semantic development {nid 39LL}
* Use well defined and narrow use cases to demonstrate benefits of semantic approaches {nid 39LM}
* Having explicit vocabularies (classifiers) is a must in a distributed system; {nid 39LN}
* The community should be included in the development and evolution of vocabularies {nid 39LO}
* It is critical to capture and evolve domain knowledge in a form that the community is comfortable with {nid 39LP}
* Transition from implicit domain knowledge to explicit encoding requires community consensus - and an organization to manage the consensus {nid 39LQ}

=== Other Observations / Lessons learned {nid 39LR} ===

* UML to OWL is a common requirement for legacy systems {nid 39LS}
* Starting from scratch is rare. {nid 39LT}
* Ontology patterns are very helpful, and encourage model reuse {nid 39LU}
* Semantic techniques work best when not compromised by implementation tradeoffs {nid 39LV}
* Semantic methods are faster to implement and easier to maintain ) {nid 39LW}
* Semantic approaches particularly suited to systems with many complex constraints, rules, laws, with frequent changes {nid 39LX}
* Incremental implementation is possible through federation of datastores ) {nid 39LY}
* Ontologies are not always applied to enable reasoners - sometimes just as a more rigorous data modeling approach {nid 39LZ}
* Engineers turned ontologists often don't have the necessary background/skills {nid 39M0}
* Existing infrastructure supports traditional software development far better than large-scale ontology development {nid 39M1}
* There are many ontologies of dubious quality {nid 39M2}
* Service-oriented architectures allow separation of code and ontology updates {nid 39M3}
* Reasoner and query engine performance is highly dependent upon the exact formulation of rules and queries {nid 39M4}
* No single technology/tool currently provides the best solution across all large system use cases {nid 39M5}

== Conclusion {nid 39M6} ==

Engineering, in particular systems engineering, can garner benefits in many ways from the use of ontology. To more completely insinuate ontology and ontological analysis into the engineering community and its processes, the skills most needed include a combined understanding of a scientific or engineering discipline and knowledge of ontological analysis and ontology-based technologies. To realize this combination a transition based on existing paradigms and tools will need to be exploited in order to create the infrastructure needed for quality ontology development and more general use. In particular, the efforts by the Object Management Group (OMG) to provide a formal semantic underpinning to their Unified Model Languages and it derivatives (e.g., SysML) provide a step to meet the goal. Moreover, organizations such as the International Council on Systems Engineering (INCOSE) are already engaged in helping foster the inculcation and growth in the use of ontological analysis and ontology in their community. {nid 39M7}

Pragmatically “big systems” and “big data”, especially from a cost perspective, have little technological recourse but to exploit the benefits to be gained from the use of ontology and ontological analysis. {nid 39M8}