OntologySummit2012: (Track-3) "Challenge: ontology and big data" Synthesis    (32DF)

Mission Statement:    (32DG)

The mission of this track is to identify appropriate objectives for an "Ontology and Big Data" challenge, prepare problem statements, identify the organizations and people to be advocates, and identify the resources necessary to complete a challenge. The goal will be to select a challenge showing benefits of ontology to big data.    (32DX)

see also: OntologySummit2012_BigDataChallenge_CommunityInput    (32EA)

The goal of "Meeting Big Data Challenges through Ontology" is to identify challenges that will advance ontology and semantic web technologies, increase applications, and accelerate adoption.    (38F8)

Current State    (38IT)

Ontology may tame big data, drive innovation, facilitate the rapid exploitation of information, contribute to long-lived and sustainable software, and improve Complicated Systems Modeling.    (38FA)

Ontology might help big data, but why this usually fails    (38IU)

  1. easy to create ontologies that myriad incompatible ontologies are being created in ad hoc ways leading to the creation of new, semantic silos    (38FC)
  2. The Semantic Web framework as currently conceived and governed by the W3C (modeled on html) yields minimal standardization    (38FD)
  3. The more semantic technology is successful; they more we fail to achieve our goals    (38FE)

* Just as it’s easier to build a new database, so it’s easier to build a new ontology for each new project    (38FF)

* You will not get paid for reusing existing ontologies (Let a million ontologies bloom)    (38FG)

* There are no ‘good’ ontologies, anyway (just arbitrary choices of terms and relations …)    (38FH)

* Information technology (hardware) changes constantly, not worth the effort of getting things right    (38FI)

Linked data to lower the costs of reusing data more than anything. In addition, government data is used quite widely already, so we feel there are huge opportunities in promoting this in the Federal space.    (38FJ)

Current Uses / Examples    (38IV)

Systems Engineering Modeling Languages and Ontology Languages    (38FL)

Drive Innovation    (38FM)

Federation and Integration of Systems    (38FN)

Driving Innovation with Open Data - Creating a Data Ecosystem    (38FO)

1. Gather data    (38FP)

* from many places and give it freely to developers, scientists, and citizens    (38FQ)

2. Connect the community    (38FR)

* in finding solutions to allow collaboration through social media, events, platforms    (38FS)

3. Provide an infrastructure    (38FT)

* built on standards    (38FU)

4. Encourage technology developers    (38FV)

* to create apps, maps, and visualizations of data that empower people’s choices    (38FW)

5. Gather more data    (38FX)

* and connect more people    (38FY)

6. Energy.Data.gov connects works with challenges across the nation to integrate federal data and bring government personnel to code-a-thons    (38FZ)

7. Data Drives Decisions    (38G0)

* Apps transform data in understandable ways to help people make decisions    (38G1)

Rapid exploitation of information    (38G2)

1. In this world, the benefit is derived from the rapid pace at which new data and new data sources can be combined and exploited.    (38G3)

2. High level reasoning over curated information In this world, the benefit is derived from non-trivial inferences drawn over highly vetted data.    (38G4)

3. Many times people try to have both expressivity and scale. This is very expensive    (38G5)

* Don’t be seduced by expressivity    (38G6)

* Just because you CAN say it doesn’t mean you SHOULD say it. Stick to things that are strictly useful to building your big data application.    (38G7)

* Computationally expensive    (38G8)

* Expressivity is not free. It must be paid for either with load throughput or query latency, or both.    (38G9)

* Not easily partitioned    (38GA)

* Higher expressivity often involves more than one piece of information from the abox – meaning you have to cross server boundaries. With lower expressivity you can replicate the ontology everywhere on the cluster and answer questions LOCALLY.    (38GB)

* A little ontology goes a long way    (38GC)

* There can be a lot of value just getting the data federated and semantically aligned.    (38GD)

4. Unfortunately it is now so easy to create ontologies that myriad incompatible ontologies are being created in ad hoc ways leading to the creation of new, semantic silos    (38GE)

5. The Semantic Web framework as currently conceived and governed by the W3C (modeled on html) yields minimal standardization    (38GF)

6. The more semantic technology is successful, they more we fail to achieve our goals    (38IW)

Areas of Use (both current and future) / Areas of non-use    (38IX)

Ontology Design Patterns for Systems Engineering    (38GH)

Ontology for Software Production - Instantiating the ontology describes design of a particular system    (38GI)

Cyber-Physical Social Data Cloud Infrastructure    (38GR)

NIST & NICT Collaboration Project R&D of a cloud platform specialized for collecting, archiving, organizing, manipulating, and sharing very large (big) cyber-physical social data    (38GS)

Use case 1 - Healthcare data publishing & sharing    (38GT)

Use case 2 – Location Aware -based Service (e.g., disaster)    (38GU)

Globally monitoring and locally fencing (safe and rapid evacuation)    (38GV)

Information and Communication Technology (ict)    (38GW)

Why a Materials Genome Initiative? Materials Are Complicated Systems Modeling is a Challenge    (38H0)

The Materials Genome Initiative is a new, multi-stakeholder effort to develop an infrastructure to accelerate advanced materials discovery and deployment in the United States. Over the last several decades there has been significant Federal investment in new experimental processes and techniques for designing advanced materials. This new focused initiative will better leverage existing Federal investments through the use of computational capabilities, data management, and an integrated approach to materials science and engineering.    (38H1)

Next steps    (38H2)

* File repository for first principles calculations    (38H3)

** File repository for CALPHAD calculations    (38H4)

** General data repository Prototype repository for data used in Calphad assessments    (38H5)

* Evaluation of data storage formats (e.g. markup language, hierarchical data format)    (38IY)

Accessibility (i.e., ease of use) / Impediments    (38IZ)

Ontology Quality for Large-Scale Systems    (38H7)

Ontology Tools and Training for Systems Engineers    (38H8)

Recommendations    (38J0)

Some big systems and systems engineering needs and desires of ontology are:    (38HA)

* More opportunities for social, economic and political participation    (38HG)

* Open platform for everyone, new public good    (38HH)

* Non-expert system    (38HI)

* Crowd sourcing, citizen science    (38HJ)

* Establish new information ecosystem to create new opportunities, services and jobs    (38HK)

* Benefit from cultural diversity    (38HL)

* Value-sensitive design    (38HM)

The European FuturICT (Information and Communication Technology) Paradigm is:    (38HN)

Big data might benefit from ontology technology but why this usually fails    (38HY)

Need a science of multi-level complex systems!    (38IA)

Linked Open Data (LOD)    (38J1)

Linked Open Data (LOD) is hard to create    (38IC)

** Key idea: Reduce problem complexity by having (1) User enter a simple graph, and (2) Annotate it words and phrases    (38IL)

References    (32L7)

Links    (32TX)

Documents    (32U2)

 maintained by the Track-3 champions: ErnieLucier & MaryBrady ... please do not edit    (32DH)