MW: Well I’m not sure there is a generally agreed definitions of what an ontology is [[Sjir2: indeed, I agree with you and I believe this forum should admit that clearly, and start work to get to a series
of definitions that can be assigned to the widely varying kinds of “ontologies” mentioned in this forum.]], but we are not talking about the philosophical study of what exists. [[Sjir2: I agree.]] My definition for the purposes of this summit is:
A formal (i.e. computer processable) representation of (some of) the things that exists and (some of) the rules that govern them.
[[Sjir2: Another proposal: A complete and truely conceptual (in the sense of ISO TR9007) ontology is a formal (i.e. computer processable) representation of
a.
the kinds of things considered within scope of a certain ontology,
b.
the kinds of facts about instances of these kinds of kinds and
c.
all the associated integrity rules about the fact populations and fact population transitions.
d.
There is always a human understandable representation (in a CNL), that is extended with a set of all relevant concept definitions.]]
MW: As I said, there is no general consensus on the definition of ontology, and you immediately prove my point. Broadly I think your definition amounts to the same as mine, and is pretty much in line with
what I would expect a definition of a conceptual data model to be, but there are some key differences.
·
Your definition is restricted to “kinds of things”. This is fine for a data model, but other forms of ontology (e.g. OWL or CL based ontologies, or indeed master data) can include individuals such
as you me and the USA. [[Sjir3: this is a misunderstanding; I should have said that I consider individuals included.]]
MW: Then a. Needs to say “the things considered within scope of a certain ontology.” [[Sjir3: I would propose to say: the things and kinds of things considered within scope
of a certain ontology.]]
Also,
·
an ontology does not just need to consist of facts, but can include negations.
[[Sjir3: are as well included in my proposal.]]
MW: Then again it is not just “kinds of facts”, but “facts” that you should be referring to in b. [[Sjir3: it seems you want to include both facts and kind of facts. My experience
is that it helps in practical conceptual engineering to make a distinction between three levels, groud facts, fact types and meta fact types.]]
Again, in databases we do generally only hold things that are asserted to be true (at least you have to work very hard to hold something negative) but that is not a limitation
of other forms of ontology. Finally,
·
there are far more rules than integrity rules (unless you have a very different meaning for the word “integrity” than I have).
[[Sjir3: an integrity rule is a rule that restricts the states and the transitions of the fact base to permitted ones. Yes, we do consider other rules like derivation rules and behavioural rules.]]
MW: Then probably “integrity” should be dropped (unless you are trying to restrict what an ontology can be to the sort of ontology you deal with). [[Sjir3: I propose we investigate
this further using a representative set of illustrating examples.]]
Examples can be as diverse as Cyc, a database schema, and Master Data.
1.
Measure the quality of the result against the requirements that it should meet and fix the defects. [[Sjir: I suggest to take the three principles (Helsinki, 100 % and Conceptual) of ISO TR9007 into account.]]
MW: I’m sorry, I don’t follow you there. Could you elaborate please?
[[Sjir2: ISO TR9007 (TR stands for Technical Report, often a predecessor of a standard) was an effort by ISO that started in 1978 and was finished in 1987. It includes a validatable definition of a conceptual
schema (The description of the possible states of affairs of the universe of discourse including the classifications, rules, laws, etc., of the universe of discourse.
MW: Yes, I remember it. It was enormously influential in the data modelling community.
(Page I-4) ) and the following three principles:
The Helsinki Principle
Any meaningful exchange of utterances depends upon the prior existence of an agreed set of semantic and syntactic rules. The recipients of the utterances must only use these rules to interpret the received
utterances, if it is to mean the same as that which was meant by the utterer. ISO TC97/SC5/WG3- Helsinki 1978 (Page 0-2)
MW: As long as those rules include the definition of terms used in the exchange so the intended interpretation is clear, then that makes good sense. Of course it is still necessary if they are not, just
not sufficient.
Conceptualization Principle
A conceptual schema should only include conceptually relevant aspects, both static and dynamic, of the universe of discourse, thus excluding all aspects of (external and internal) data representation, physical
data representation and access as well as all aspects of a particular external user representation such as message format, data structures, etc. (page I-9)
MW: Yes. This is the sense in which data modellers use the word “conceptual”, i.e. it is a model of the things, rather than the way data about the things are structured.
100 Percent Principle
All relevant general static and dynamic aspects, i.e. all rules, laws, etc., of the universe of discourse should be described in the conceptual schema. The information system cannot be held responsible
for not meeting those described elsewhere, including in particular those in application programs. (page I-8)
MW: The key question here is what makes something relevant?
[[Sjir3: that is to be decided by the subject matter expert of the domain, not the ontologist.]]
MW: Well I’m afraid do not trust users to know what their requirements are. I have had more problems from people “building the requirements” than almost anything else. [[Sjir3:
I understand what you say here. However I believe that the conceptual modeler has the challenge to present his questions in concrete examples in such a way that the subject matter expert only has to say yes or no. This is a close analogy with testing of a
program. But my belief is that the subject matter expert has the power to decide, having listened to the arguments and illustrating examples of the conceptual modeler]]
2. Use a process or methodology to ensure the quality of the resultant ontology.
[[Sjir: I stongly agree with this.]]
That is, Proactive versus Reactive.
The advantage of using a methodology are that you get it (or at least more of it) right first time, thus avoiding the cost of rework to fix the defects. [[Sjir: I stongly agree with this.]]
- Do such methodologies exist for ontologies? [[Sjir: that depends on what you mean by ontology. Informally yes, but that is outside the “ontology”” community.]]
MW: I believe there are also some within the “ontology” community, as well as the broader data modelling/relational database community.
[[Sjir2: please let me know which ones.]]
MW: At least parts of the Medical/Biological community have been using a methodology developed by Barry Smith and his co-workers. I presume Cyc have a methodology (they must have to successfully develop something
of that size). Chris Partridge has his Boro methodology. We have our ISO 15926 methodology (though that has its origins in data modelling) and I’m sure there are others. I’m hoping we will hear from them on this track.
[[Sjir3: I propose we start to use a concrete domain description to illustrate our discussions and a protocol how to develop the formal conceptual domain model. Here is a description of a simple domain but sufficient
to illustrate some essential points:
MW: Good. This is a good example of what I mean about requirements being inadequately specified.
Requirements for the domain Famous composers
1.
Domain SJ008g is about some aspects of famous composers and some related aspects. The community where Sjir lives is the owner of domain SJ008g. The community has very precise ideas
what is within the scope of Domain SJ008g.
2.
The community is interested in Famous composers,
MW: What does famous mean? How many people (or percentage of the population) need to know about the composer for them to be considered “famous”. Or is fame defined in some other way?
[[Sjir3: this is decided by the SJ008g community.]]
3.
but only if it is also known in the domain which their birth country is.
MW: Which country is of interest? Is it the country that existed when they were born, or the country that currently covers where they were born?
[[Sjir3: Thanks for this this question. In the concept definition list is said it is the country that currently covers where the famous composer is born.]]
4.
As living Famous composers are also declared within scope, the age at which a famous composer died, if applicable, is also within scope.
5.
Furthermore the community wants to know the capital city of every country that is the birth country of a famous composer.
MW: Again, which capital is required, the current capital, or the capital when they lived there?[[Sjir3: the current capital.]]
6.
Needless to say famous composers in domain SJ008g have exactly one birth-country.
MW: Not if you want to hold both the current country of their birth place and the historical country of their birth place.[[Sjir3: with the answers
given above, this is solved.]]
7.
And also needless to say that each birth-country has exactly one capital city.
MW: But which capital? The cultural capital, the financial capital or the political capital? [[Sjir3: the political capital.]]
8.
And also that each capital city is the capital of exactly one such country.
MW: But which country, the current one or the historical one? (A current capital might not have been one historically and vice-versa)[[Sjir3:
the current.]]
9.
Furthermore of such composers it is considered within scope which composer visited which country in which year.
10.
Only visits to a country other than the birth country are considered.
11.
Of such a country the community also declared within scope that the capital must be known.
MW: So what if it isn’t. Does that mean the country cannot be known, or that you make a capital city up? (This is typical of the kind of constraint that leads to poor quality data being produced,
when for example a salesman has to know what kind of industry the customer is in before he can enter a customer on the database, so they fill anything in – usually the default value – at least with a blank field you know they don’t know.)
[[Sjir3: every country has currently a political capital.]]
[[Sjir3: a few illustrating examples of ground facts the SJ008g community currently holds:
F1: Wolfgang Amadeus Mozart was born in Austria.
F2: Wolfgang Amadeus Mozart died at 35 years old.
F3: Guisseppe Verdi was born in Italy.
F4: Guisseppe Verdi died at 87 years old.
F5: Gustav Mahler was born in Austria.
F6: Gustav Mahler died at 50 years old.
F7: The capital of Italy is Rome.
F8: The capital of Austria is Vienna.
]]
Regards
Matthew West
Information Junction
Tel: +44 1489 880185
Mobile: +44 750 3385279
Skype: dr.matthew.west
matthew.west@xxxxxxxxxxxxxxxxxxxxxxxxx
http://www.informationjunction.co.uk/
http://www.matthew-west.org.uk/
This email originates from Information Junction Ltd. Registered in England and Wales No. 6632177.
Registered office: 2 Brookside, Meadow Way, Letchworth Garden City, Hertfordshire, SG6 3JE.