[Top] [All Lists]

[oor-forum] definition: registry vs. repository; goals

To: <oor-forum@xxxxxxxxxxxxxxxx>
From: "Obrst, Leo J." <lobrst@xxxxxxxxx>
Date: Mon, 21 Jan 2008 14:20:51 -0500
Message-id: <9F771CF826DE9A42B548A08D90EDEA8002BC4EAA@xxxxxxxxxxxxxxxxx>


Here are a couple of insertions about 1) registries vs. repositories, 2) goals/phases for an RDF/OWL repository. This might jumpstart our discussion.This is from an effort we did in 2006. Perhaps these comments could facilitate our effort here. I will see if we can provide more from the original document.

Also, I think we should review the progress that other projects such as XMDR (XMDR.org) have made along these lines.

Here is the definition we used. We might want to make the distinctions clear up front.

1) Registry vs. Repository

There is often confusion over the distinction between a registry and a repository. Both are data structures or stores. For the purposes of this paper, a registry is defined to be a data structure where data, metadata, knowledge or semantic objects are listed as being available, along with who placed them there, and their conditions of use. A repository is a data structure where data, metadata, knowledge or semantic objects and related artifacts are placed, and can be accessed from; typically repositories include management software. A registry therefore can be considered a portion of a repository, a listing or table of contents for the repository, rather than the entire contents of the repository (which repository can be either centralized or distributed). A simple example of a registry is the Windows Registry, which is a listing or table of installed programs, rather than the entire set of programs themselves and their data.

2) Goals

The purpose of an RDF/OWL Repository is to provide an architecture and an infrastructure that supports a) the creation, sharing, searching, and management of ontologies, and b) linkage to database and XML Schema structured data and documents. Complementary goals include fostering the Semantic Web community, the identification and promotion of best practices, and the provision of services relevant to the RDFS and OWL ontologies and RDF instance stores. In addition, because the Semantic Web languages are represented as formal logics, automated semantic interpretation of content expressed in Semantic Web languages and inference over this content is enabled. Such repositories ultimately will support a broad range of semantic services and applications of interest to enterprises and communities.

Achieving these goals will help reduce semantic ambiguity whenever and wherever information is shared, thereby allowing information to be located, searched, categorized and exchanged with a more precise _expression_ of its content and meaning. The artifacts of the repository will provide a semantic grounding for diverse formats and domains, ranging from the conceptual domains and disciplines of communities to technical schema such as WSDL, UDDI, RSS, and XML schema. Perhaps most importantly, the repository will enable wide-scale knowledge re-use and reduce the need to re-invent the wheel to define concepts and relationships that are already understood. An example of the latter are "portals" that contain manually hard-coded information derived from another source.

These goals cannot be achieved at once, and must track the evolution of best practices as well as technology itself. It is also good system development practice to bound complexity by defining a system in terms of a series of short-term, achievable objectives. For this reason, as for other such initiatives, it?s envisioned that the RDF/OWL Repository will be developed in a series of phases, proceeding from the simple to the complex, with achievable goals that capitalize on previous experience and the emergence of technology over time. It is important to note that for any given phase, planning and prototyping is always in progress for subsequent phases. These phases are notionally described in the following sections.

Phase 1: Storage, Access, Business Processes, and Tools for Ontologies.

In the first phase three major aspects are addressed.

The first aspect emphasizes storing, searching, and locating ontologies. The architecture consists of 1) an index that contains metadata about an ontology or an RDF store, 2) the ontology and RDF stores, 3) user services to support storing, searching, and locating ontologies.

The second aspect addresses business processes. Policies will be developed to guide the naming, management and partitioning of ontologies and the collection of simple metrics to help understand how the Repository is used. Ontology development will focus on the representation of simple taxonomies, on so-called facet ontologies (i.e., simple property hierarchies; typically facets define terminological dimensions as in thesauri), or more, depending on their needs. Important subject areas during this phase include general reference information, such as Country Codes and similar categories. Core metadata requirements for ontologies will be refined during this period. Translation services will be defined or provided for simple cases, e.g., OWL to HTML.

A very important business process function is the notification of ontology changes to ontology clients (either humans or automated services). During Phase 1, various mechanisms, such as the use of RSS feeds, should be investigated, as well as the appropriate vocabulary needed to communicate these changes.

The third area addresses downloadable tools appropriate to use in the development and maintenance of ontologies and related functions. An example is an ontology editor: some editors, like Protégé, are relatively mature, are open source, and are generally available.

Initially, the architecture itself should in principle support a de-centralized storage of ontologies and data stores. To facilitate searching and to reduce response time for search and discovery, the ontology metadata index itself can be centralized. Research during this period will focus on ontology editing tools, ontology mapping tools, the linking of ontologies to RDF "instance" stores of facts, ontology modularity and composability, representation of numerics, and the scalability of Semantic Web technology.

Achievement of Phase 1 goals themselves will be a significant achievement. The Repository could, for example, precisely define the concepts that are usually associated with a subject authority, such as geographic region codes, various subject matter codes and categories, and taxonomies used by search engines to identify broader and narrower terms.

Phase 2: Better Access, Search, Validation, Importing, and Mapping of Ontologies

Phase 2 emphasizes providing a broader array of machine services for access and use of repository contents. These will include validation and searching of ontology contents with path-type queries and conceptual queries. Ontologies will be developed to describe or even generate database schema and interfaces traditionally defined only by the use of a hardcopy Interface Control Document. Ontology domains will expand to include service and rule-based descriptions. The facet ontologies developed during Phase 1 will be used to define and describe integrating, conceptual ontologies in various domains. Search technology will use ontologies to search over RDF data stores.

During Phase 2, special attention will be paid to issues including ontology importing, composition, and mapping. Downloads of additional semantic tools will be provided in Phase 2. These tools will be tested for interoperability with Repository services and other tools. An example is an OWL editor, which should produce OWL code that a validator can recognize as valid OWL.

Phase 3: Distributed Query Support and Automated Inference

Phase 3 marks a shift in the use of the Repository. In Phase 3, the use of the Repository to support inferencing and intelligent distributed query automatically will be expanded. Web crawling, indexing, and ontology-aided classification/categorization will be more prominent. The use of Repository ontologies to access and search "back-end" databases will be implemented. Mechanisms will be added to automatically notify human and machine subscribers of changes in the repository contents. Security features will be added to support restrictions by community of interest. Translation services for OWL to other target languages, e.g. UML, will be expanded. It is expected that a class of ontologies will be identified as approved or certified during this phase.

Phase 4: Expanded Community Support

Phase 4 is an expansion of the capabilities developed in previous Phases to support large-scale RDF and OWL stores and federated search across community resources. The ontology repository will more actively support the needs of community information interoperability and large-scale information exchange mechanisms as identified by the community.

Dr. Leo Obrst       The MITRE Corporation, Information Semantics
lobrst@xxxxxxxxx    Information Discovery & Understanding, Command and Control Center
Voice: 703-983-6770 7515 Colshire Drive, M/S H305
Fax: 703-983-1379   McLean, VA 22102-7508, USA

Message Archives: http://ontolog.cim3.net/forum/oor-forum/  
Subscribe: mailto:oor-forum-join@xxxxxxxxxxxxxxxx 
Config/Unsubscribe: http://ontolog.cim3.net/mailman/listinfo/oor-forum/  
Shared Files: http://ontolog.cim3.net/file/work/OOR/ 
Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OpenOntologyRepository     (01)
<Prev in Thread] Current Thread [Next in Thread>
  • [oor-forum] definition: registry vs. repository; goals, Obrst, Leo J. <=