[Top] [All Lists]

Re: [oor-nsf07601] Draft Proposal: Executive Summary

To: <oor-nsf07601@xxxxxxxxxxxxxxxx>
From: "Obrst, Leo J." <lobrst@xxxxxxxxx>
Date: Fri, 31 Oct 2008 02:07:07 -0400
Message-id: <9F771CF826DE9A42B548A08D90EDEA8003E6D487@xxxxxxxxxxxxxxxxx>

[ *** This list is for use by the OOR nsf-07061 team only - please do not share 
or forward message without consent from the team! *** ]    (01)

This is a very good summary, Ken. Thank you and K for taking the lead on this.

Dr. Leo Obrst, MITRE, Information Semantics, lobrst@xxxxxxxxx, 703-983-6770

----- Original Message -----
From: oor-nsf07601-bounces@xxxxxxxxxxxxxxxx <oor-nsf07601-bounces@xxxxxxxxxxxxxxxx>
To: oor-nsf07601 <oor-nsf07601@xxxxxxxxxxxxxxxx>
Sent: Fri Oct 31 01:19:41 2008
Subject: [oor-nsf07601] Draft Proposal: Executive Summary

[ *** This list is for use by the OOR nsf-07061 team only - please do not share or forward message without consent from the team! *** ]

I have prepared a draft for the proposal.  The following is the Executive
Summary.  Other material is on the OOR wiki, but it is not in the proper
format for the proposal yet.  I will post the material on the OOR wiki as
I get it properly formatted.

-- Ken

The science and engineering communities are producing very large data sets that
are also increasingly complex and diverse.  These data sets are very well
suited for particular narrowly-defined, discipline-specific purposes.  In
principle, these data sets could be used for solving more broadly-defined
scientific problems such as understanding whole organisms, ecosystems and human
populations.  However, incorporating multiple data types from multiple sources
to solve these problems remains a significant challenge.  For example, a
testable macroscopic biological hypothesis might involve the effect of
environmental or climatic change on the genomic makeup of a given organism.  As
another example, a macroeconomic hypothesis concerning the most efficient use
of resources to improve the quality of life in a region will depend on cultural
and environmental knowledge as well as economic statistics.

While the data sets that are currently being developed typically engender the
greatest level of enthusiasm by the communities that are creating them, data
sets created in the past can have equal importance for related communities.
Biodiversity is a case in point.  The painstaking observations by generations
of biologists over centuries represent an important resource for modern ecology
and biodiversity studies, but those observations are locked in old textbooks
and monographs that are not easily accessed by modern computing technology.
The problem is not just the differences in recording media (paper versus disks)
but also the enormous changes in terminology over time.  Current data sets run
the risk of an even more rapid obsolescence as the meaning of the data fields
is forgotten even by the individuals who introduced them.

We believe in the promise of semantic technologies based on logic, databases
and the Semantic Web as a means of addressing the problems of meaningful access
to and integration of data over decades and centuries.  Such technologies
enable distinguishable, computable, reusable, and sharable meaning of
information artifacts, including data sets, documents and services.  We also
believe that making this vision a reality requires additional supporting
resources, and that these resources should be open, extensible, and provide
common services over the ontologies.  Our belief in this vision is based not
only on current experience but also on the deep philosophical foundations that
underly modern ontological engineering.

We propose to develop an open ontology repository (OOR) of controlled
vocabularies and knowledge models that have been encoded in RDF, OWL, and other
knowledge representation languages.  More specifically, we propose to develop
an open repository for the metadata and data sets of the following communities:

1. Biology, especially the genomics, proteomics and other "omics" communities.
This will be based on the highly successful BioPortal repository.

2. Biodiversity, especially the species pages in the Encyclopedia of Life.

3. Climate and environmental communities (including both natural environments
and built environments).

4. Human culture and sociology.

The data sets of these communities share a number of characteristics that make
them well suited to the proposed OOR:

1. They represent large data sets that are of considerable importance to their
respective communities.

2. Some of the data sets, especially those of the biodiversity community,
represent data that is very old, sometimes centuries old, yet still of
considerable value.

3. The integration of these data sets opens up exciting research opportunities
not only for the natural sciences but also for environmental and social

4. The data sets have complex semantics, and there is no clear distinction
between data and metadata.  As a result, modern relational database technology
is poorly suited for modeling the data sets.

While these data sets provide a compelling case for the proposed OOR, the
prospect of broader impacts is even more compelling.  As an integral part of
the proposed project, we intend to foster a vigorous educational outreach
program to bring other data-intensive research communities into the OOR
initiative.  Since the OOR will be an open, federated architecture and
infrastructure, it is intended to be utilized by communities to host their own
ontologies as well as allowing the communities to adapt previously established
ontologies for their own purposes.

To address the issue of long-term sustainability, we propose to develop a new
paradigm for maintaining semantic linkages available through the
Internet. Specifically, we will develop a federated knowledge repository that
can collectively correct for multiple points of failure and can foster
collaborative stewardship of scientific knowledge.  Particular emphasis will be
given to the development of technological solutions that build on existing,
proven architectures for maintaining biological (e.g., BioPortal, OBO Foundry
and the International Nucleotide Sequence Data Consortium) and abiotic data
(e.g., the National Climatic Data Center), as well as standards for metadata
and services (e.g., ISO XMDR, WSDL and UDDI).

Message Archives: http://ontolog.cim3.net/forum/oor-nsf07601/ 
Shared Files: http://ontolog.cim3.net/file/community/project/OOR/nsf07601/
Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OpenOntologyRepository_Proposal/Nsf_07601

Message Archives: http://ontolog.cim3.net/forum/oor-nsf07601/  
Shared Files: http://ontolog.cim3.net/file/community/project/OOR/nsf07601/ 
http://ontolog.cim3.net/cgi-bin/wiki.pl?OpenOntologyRepository_Proposal/Nsf_07601    (01)

<Prev in Thread] Current Thread [Next in Thread>