Re: [ontolog-forum] Foundation Ontology

To:	"'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From:	"Patrick Cassidy" <pat@xxxxxxxxx>
Date:	Thu, 28 Aug 2008 17:21:14 -0400
Message-id:	<01e501c90954$00cbd130$02637390$@com>

Comments in-line.

Patrick Cassidy

MICRA, Inc.

908-561-3416

cell: 908-565-4053

cassidy@xxxxxxxxx

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Neil Custer
Sent: Thursday, August 28, 2008 2:59 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Foundation Ontology

Examples in the metadata of a database schema can be helpful as well. This is why schemas should always be well-annotated. Thus, if defining a month field in a database and named it Accrual_Month with a format of CHAR(3), I would also put examples such as "Jan", "Sep", or "Oct" but not all caps such as "NOV" or "DEC".

[[PC]] In an ontology there will be very few types whose instances consist of a mere enumerations of strings. January, February, etc. are themselves types of recurring time interval, and have some well-defined properties that should be specified – i.e. they are first-class entities, not strings. Thinking of an ontology as just another form of database schema is likely to lead to inadequate semantics for applications that need something more than a traditional database.

However, you've confused me as to how examples are usefule to test the validity of the ontology structure. I though consistency checks were to see if instances were possible based on the structure. If what you are alluding is the "human" check, then example instances can be entered during that validity check performed during construction and/or maintenance of the ontology, correct?

[[PC]] One form of consistency check does look at “disjoint” statements to verify that any instance would be inconsistent, where some type is a subtype of disjoint types. But there are other kinds of consistency check that pick up errors in various kinds of usage, including some subtle ones that propagate down an inheritance chain from constraints at a high-level type. Instances are useful to pick up such subtle issues. As a practical matter, having example of how instances are described can make using an ontology easier to use by providing templates that can be copied and modified for instances that are close in meaning to existing types.

Also, let's say you would want to retain some sample instances in the ontology -- does that have to mean that all instances were directed stored as part of the ontology?

[[PC]] No, I specifically mentioned *examples*, and in my first sentence stated that the application data need not be part of the ontology. In the ontology systems I have worked with there is no need to differentiate the instances from the type structure of the ontology – in fact, they types themselves are instances of metatypes, and both types and instances are smoothly handled by the implementation. But if one wants to use an ontology for multiple specialized and separate applications, it is simple enough to have the instances in each application defined in a separate file that serves as an extension to the ontology of types. One could also have all example instances (created for the purposes I mentioned) in an extension, so they don’t *have to* be part of the “ontology”, if you want to differentiate the T-box and A-box. But the point I am making is that the ontology should be tested with a number of instances, and having instances that are well-known entities from the real world helps to “ground” the ontology in its intended external referents so that it is not merely some closed set of symbols that could mean anything at all, or nothing. That point is to make the ontology easier to understand for the human ontologist.

This is wildly different than data in a database or information encoded in an XML document that is built according to a schema. In both cases, the database schema and the XML schema, these are the intension of the data to be contained in the construct, whereas the rows in tables and information in the XML document are the extension. Why cannot this same concept apply to ontology construction?

[[PC]] As I said, if you feel that you must, you could structure it that way, but if you use metatypes that have instances that are types of things in your database, it will not be used quite like a traditional database. Some of the table names will wind up as “data” in some of the tables.

==========================

On Thu, Aug 28, 2008 at 10:16 AM, Patrick Cassidy <pat@xxxxxxxxx> wrote:

Neil,

A brief reply on one point you raised:

[NC] I've thought for a while that having instance values included in the structure of an ontology seemed an inefficient way to put an effective knowledge representation system together. In a somewhat simplified example, when using a relational database, one doesn't look at the database schema at the same time they look information filling the rows in tables. The only time they need to know the names of those tables and columns is when they are building queries, views, and reports.

Although the data of interest in applications need not be directly part of an ontology, I believe that it is useful to include some well-known instances of things as examples of types, so as to make the meanings of terms clear. This is easiest with physical objects and historical events. Instances are also useful as a means to test the validity of the ontology structure, since some consistency tests do not show errors until instances are described with the relations.

Pat

Patrick Cassidy

MICRA, Inc.

908-561-3416

cell: 908-565-4053

cassidy@xxxxxxxxx

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Neil Custer
Sent: Thursday, August 28, 2008 10:11 AM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Foundation Ontology

Thanks for the reply John.

You certainly raise valid and interesting points and I have no doubt that Google developers know what they are doing. I also certainly understand that any compiled data is going to be more compact in a binary format and as a result, much quicker to process as well. I will not argue that too many developers try to use XML as a "silver bullet"--there are certainly many, many applications that either process large volumes of data or are very time critical. In these cases the size of the data packages certainly must be clearly defined and compiled to be optimized for pure processing speed such as your colleague's telecom app.

A few additional thoughts:

Google developers are not building a Foundation Ontology on which they expect the rest of the knowledge representation world to base their work (at least that we know of at this time). If they are, perhaps they could share their insight with us...

Given the similarities in the two formats as posed in your original post, one could fairly easily build a conversion application that applies the rules for constructing .proto files from an .owl or .rdf file. Perhaps the central issue is the separation of the information from the structure. This would (in my limited experience of working with ontologies) be the equivalent of separating the instances from the knowledge representation structure. However, my limited experience with knowledge bases may be the source of my confusion here.

I've thought for a while that having instance values included in the structure of an ontology seemed an inefficient way to put an effective knowledge representation system together. In a somewhat simplified example, when using a relational database, one doesn't look at the database schema at the same time they look information filling the rows in tables. The only time they need to know the names of those tables and columns is when they are building queries, views, and reports.

I wholeheartedly agree with you that "the data descriptors should be logically accessible when needed, [AND] independent of their physical location." With the value of the knowledge representation structure at the heart of an ontology, the representation of a concept in the ontology needs to be clear to the user of the structure when a specific application of that knowledge is being tapped. That being the case, there should be a DOI or some other persistent identifier that a knowledge application builder can refer a concept to without having to pull an entire ontological structure (or even parts of it, necessarily) into their application.

On Thu, Aug 28, 2008 at 1:33 AM, John F. Sowa <sowa@xxxxxxxxxxx> wrote:

Neil,

I would give the Google developers credit for knowing what they are
doing, especially in version 2.0 of a format that they use for very
large volumes of data. As they say, XML is good for data embedded
in text documents. Otherwise, there are better choices.

> The site states: "this message is encoded to the protocol buffer
> binary format (the text format above is just a convenient
> human-readable representation for debugging and editing)".

That merely means that the source format is the one that should
be used for development (i.e., debugging, editing, and sharing).

> This means that no one other than a programmer will ever be able
> to see that "more readable" form unless the developer goes to the

> trouble to expose it in their application in that format.

That is true of any notation that is compiled for further processing.
Developers use the source form. For any application, no one other
than a programmer ever sees anything except the results that the
program generates.

> This format does not express the structure with the message content,
> one of the greatest strengths of XML and the languages based on it
> such as RDF and OWL.

The data descriptors should be logically accessible when needed,
independent of their physical location. Following is an excerpt from
a note I sent to a different forum in which similar issues were raised.

John
______________________________________________________________________

> A general return to non-self-documenting formats popular
> when processors were much more limited than they are today
> seems like a step in the wrong direction.

The crucial point is *accessible*. The supporting tools must make
the descriptors accessible whenever they're needed, but that does
not imply that they must be physically stored and duplicated with
every data instance.

Furthermore, the processors have never become so fast that it's
OK to be wasteful. If you're processing thousands of data elements,
the time may be negligible compared to human response time. But if
you have to make a decision to switch data streams in milliseconds
or microseconds, many applications cannot afford to use XML.

If you are storing and processing petabytes or more of data, a factor
of 2 or 10 can make a difference of multimillions of dollars in
hardware cost. Google and many other companies know that.

But if you really need to keep the descriptors physically close, then
there is an alternative that is more expressive than RDF(S) or OWL
and about a factor of 10 more compact. It's also an ISO standard:
Common Logic in either the CGIF or CLIF notations.

My colleague Arun Majumdar demonstrated that point on a project as
a consultant for a major telecom. They had tried to use XML for data
encoding in a task that required switching data streams in millisecond
times. But they couldn't respond within the time limits. Arun
replaced the XML encoding with conceptual graphs and met the time
constraints. The CG version went into production.

John

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx

--
Neil

--
Neil

Life is Great!-- It sure beats the alternative!!
So live better - live longer! See our exclusive health and nutrition products at:
http://ncuster.qhealthzone.com


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [ontolog-forum] Foundation Ontology [was SemanticWeb shortcomings], (continued) Re: [ontolog-forum] Foundation Ontology [was SemanticWeb shortcomings], Christopher Spottiswoode Re: [ontolog-forum] Foundation Ontology, Patrick Cassidy Re: [ontolog-forum] Foundation Ontology, John F. Sowa Re: [ontolog-forum] Foundation Ontology, Antoinette Arsic Re: [ontolog-forum] Foundation Ontology, Neil Custer Re: [ontolog-forum] Foundation Ontology, John F. Sowa Re: [ontolog-forum] Foundation Ontology, Neil Custer Re: [ontolog-forum] Foundation Ontology, Patrick Cassidy [ontolog-forum] fw: Joint NKOS/CENDI Workshop, Washington, D.C. "New Dimensions in Knowledge Organization Systems", ZENG, MARCIA Re: [ontolog-forum] Foundation Ontology, Neil Custer Re: [ontolog-forum] Foundation Ontology, Patrick Cassidy <=

Previous by Date:	Re: [ontolog-forum] Foundation Ontology, Neil Custer
Next by Date:	[ontolog-forum] First call for presentations on Implementation of Ontology Work, Duane Nickull
Previous by Thread:	Re: [ontolog-forum] Foundation Ontology, Neil Custer
Next by Thread:	Re: [ontolog-forum] Fwd: Semantic Web shortcomings [was Re:ANN: GoodRelations - The Web Ontology for E-Commerce], Len Yabloko
Indexes:	[Date] [Thread] [Top] [All Lists]