MW: Well I’m not sure there is a generally agreed definitions of what an ontology is [[Sjir2: indeed, I agree with you and I believe this forum should admit that clearly, and start work to get to a series of definitions that can be assigned to the widely varying kinds of “ontologies” mentioned in this forum.]], but we are not talking about the philosophical study of what exists. [[Sjir2: I agree.]] My definition for the purposes of this summit is:
A formal (i.e. computer processable) representation of (some of) the things that exists and (some of) the rules that govern them.
[[Sjir2: Another proposal: A complete and truely conceptual (in the sense of ISO TR9007) ontology is a formal (i.e. computer processable) representation of
a. the kinds of things considered within scope of a certain ontology,
b. the kinds of facts about instances of these kinds of kinds and
c. all the associated integrity rules about the fact populations and fact population transitions.
d. There is always a human understandable representation (in a CNL), that is extended with a set of all relevant concept definitions.]]
MW: As I said, there is no general consensus on the definition of ontology, and you immediately prove my point. Broadly I think your definition amounts to the same as mine, and is pretty much in line with what I would expect a definition of a conceptual data model to be, but there are some key differences.
· Your definition is restricted to “kinds of things”. This is fine for a data model, but other forms of ontology (e.g. OWL or CL based ontologies, or indeed master data) can include individuals such as you me and the USA. [[Sjir3: this is a misunderstanding; I should have said that I consider individuals included.]]
MW: Then a. Needs to say “the things considered within scope of a certain ontology.” [[Sjir3: I would propose to say: the things and kinds of things considered within scope of a certain ontology.]]
MW3: In my scheme of things, things already includes kinds of things, so that would be redundant.
Also,
· an ontology does not just need to consist of facts, but can include negations. [[Sjir3: are as well included in my proposal.]]
MW: Then again it is not just “kinds of facts”, but “facts” that you should be referring to in b. [[Sjir3: it seems you want to include both facts and kind of facts. My experience is that it helps in practical conceptual engineering to make a distinction between three levels, groud facts, fact types and meta fact types.]]
MW3: But fact types are facts, and so are meta fact types, they are just about different things. My approach is not to distinguish levels becuase it introduces more problems than it solves by creating a barrier between the levels. My main objection to entity relationship models is that you cannot say that an instance of an entity type is a subtype of an entity type, nor that one entity type is an instance of another. This forced separation is a barrier to expressiveness and requires choices to be made earlier than is sometimes useful.
Again, in databases we do generally only hold things that are asserted to be true (at least you have to work very hard to hold something negative) but that is not a limitation of other forms of ontology. Finally,
· there are far more rules than integrity rules (unless you have a very different meaning for the word “integrity” than I have). [[Sjir3: an integrity rule is a rule that restricts the states and the transitions of the fact base to permitted ones. Yes, we do consider other rules like derivation rules and behavioural rules.]]
MW: Then probably “integrity” should be dropped (unless you are trying to restrict what an ontology can be to the sort of ontology you deal with). [[Sjir3: I propose we investigate this further using a representative set of illustrating examples.]]
Examples can be as diverse as Cyc, a database schema, and Master Data.
1. Measure the quality of the result against the requirements that it should meet and fix the defects. [[Sjir: I suggest to take the three principles (Helsinki, 100 % and Conceptual) of ISO TR9007 into account.]]
MW: I’m sorry, I don’t follow you there. Could you elaborate please?
[[Sjir2: ISO TR9007 (TR stands for Technical Report, often a predecessor of a standard) was an effort by ISO that started in 1978 and was finished in 1987. It includes a validatable definition of a conceptual schema (The description of the possible states of affairs of the universe of discourse including the classifications, rules, laws, etc., of the universe of discourse.
MW: Yes, I remember it. It was enormously influential in the data modelling community.
(Page I-4) ) and the following three principles:
The Helsinki Principle
Any meaningful exchange of utterances depends upon the prior existence of an agreed set of semantic and syntactic rules. The recipients of the utterances must only use these rules to interpret the received utterances, if it is to mean the same as that which was meant by the utterer. ISO TC97/SC5/WG3- Helsinki 1978 (Page 0-2)
MW: As long as those rules include the definition of terms used in the exchange so the intended interpretation is clear, then that makes good sense. Of course it is still necessary if they are not, just not sufficient.
Conceptualization Principle
A conceptual schema should only include conceptually relevant aspects, both static and dynamic, of the universe of discourse, thus excluding all aspects of (external and internal) data representation, physical data representation and access as well as all aspects of a particular external user representation such as message format, data structures, etc. (page I-9)
MW: Yes. This is the sense in which data modellers use the word “conceptual”, i.e. it is a model of the things, rather than the way data about the things are structured.
100 Percent Principle
All relevant general static and dynamic aspects, i.e. all rules, laws, etc., of the universe of discourse should be described in the conceptual schema. The information system cannot be held responsible for not meeting those described elsewhere, including in particular those in application programs. (page I-8)
MW: The key question here is what makes something relevant? [[Sjir3: that is to be decided by the subject matter expert of the domain, not the ontologist.]]
MW: Well I’m afraid do not trust users to know what their requirements are. I have had more problems from people “building the requirements” than almost anything else. [[Sjir3: I understand what you say here. However I believe that the conceptual modeler has the challenge to present his questions in concrete examples in such a way that the subject matter expert only has to say yes or no. This is a close analogy with testing of a program. But my belief is that the subject matter expert has the power to decide, having listened to the arguments and illustrating examples of the conceptual modeler]]
MW3: That’s fine. It’s a problem when an analyst accepts what users say uncritically. The real job of the analyst is to see behind the requirements presented by a user to what the real requirements are. Just accepting “requirements as stated” and then blaming the users when the true requirements are not met “but you didn’t say that was a requirement” is an abdication of responsibility.
2. Use a process or methodology to ensure the quality of the resultant ontology. [[Sjir: I stongly agree with this.]]
That is, Proactive versus Reactive.
The advantage of using a methodology are that you get it (or at least more of it) right first time, thus avoiding the cost of rework to fix the defects. [[Sjir: I stongly agree with this.]]
- Do such methodologies exist for ontologies? [[Sjir: that depends on what you mean by ontology. Informally yes, but that is outside the “ontology”” community.]]
MW: I believe there are also some within the “ontology” community, as well as the broader data modelling/relational database community. [[Sjir2: please let me know which ones.]]
MW: At least parts of the Medical/Biological community have been using a methodology developed by Barry Smith and his co-workers. I presume Cyc have a methodology (they must have to successfully develop something of that size). Chris Partridge has his Boro methodology. We have our ISO 15926 methodology (though that has its origins in data modelling) and I’m sure there are others. I’m hoping we will hear from them on this track.
[[Sjir3: I propose we start to use a concrete domain description to illustrate our discussions and a protocol how to develop the formal conceptual domain model. Here is a description of a simple domain but sufficient to illustrate some essential points:
MW: Good. This is a good example of what I mean about requirements being inadequately specified.
Requirements for the domain Famous composers
1. Domain SJ008g is about some aspects of famous composers and some related aspects. The community where Sjir lives is the owner of domain SJ008g. The community has very precise ideas what is within the scope of Domain SJ008g.
2. The community is interested in Famous composers,
MW: What does famous mean? How many people (or percentage of the population) need to know about the composer for them to be considered “famous”. Or is fame defined in some other way? [[Sjir3: this is decided by the SJ008g community.]]
MW3: That is not an adequate response. The criteria for fame may well have information requirements associated with them.
3. but only if it is also known in the domain which their birth country is.
MW: Which country is of interest? Is it the country that existed when they were born, or the country that currently covers where they were born? [[Sjir3: Thanks for this this question. In the concept definition list is said it is the country that currently covers where the famous composer is born.]]
MW: That is a shame. That makes this look like a current state problem, which is much easier and less interesting than problems that look at change over time and maintenance of history. Current state problems are relatively trivial.
MW3: Also, what is a country? Is Scotland a country? Is the United Kingdom?
4. As living Famous composers are also declared within scope, the age at which a famous composer died, if applicable, is also within scope.
5. Furthermore the community wants to know the capital city of every country that is the birth country of a famous composer.
MW: Again, which capital is required, the current capital, or the capital when they lived there?[[Sjir3: the current capital.]]
6. Needless to say famous composers in domain SJ008g have exactly one birth-country.
MW: Not if you want to hold both the current country of their birth place and the historical country of their birth place.[[Sjir3: with the answers given above, this is solved.]]
7. And also needless to say that each birth-country has exactly one capital city.
MW: But which capital? The cultural capital, the financial capital or the political capital? [[Sjir3: the political capital.]]
8. And also that each capital city is the capital of exactly one such country.
MW: But which country, the current one or the historical one? (A current capital might not have been one historically and vice-versa)[[Sjir3: the current.]]
MW3: What is it about people that are interested in famous composers that makes them the authoritative source for the definition of what a capital city is? After your last message I took the trouble to look into Capital Cities. Capital Cities are not such because those interested in famous composers say they are. It turns out that each country is the authoritative source for its capital city. So for example:
1. The capital city of The Netherlands is Amsterdam, despite The Hague being the seat of government. This is so because the constitution says so.
2. Both Israel and Palestine claim Jerusalem as their capital (so much for a city only being the capital of one country).
3. London is the capital of both England and the United Kingdom.
4. I see no reason why a country could not declare more than one city as their capital cities (what is to prevent them?)
A key principal should be to seek the authoritative source for rules. There are of course rules that music lovers are the authoritative source for, e.g. what makes a composer famous, but not other things, even in a data model to support their information requirements.
9. Furthermore of such composers it is considered within scope which composer visited which country in which year.
MW3: How do you visit a country that did not exist?
10. Only visits to a country other than the birth country are considered.
11. Of such a country the community also declared within scope that the capital must be known.
MW: So what if it isn’t. Does that mean the country cannot be known, or that you make a capital city up? (This is typical of the kind of constraint that leads to poor quality data being produced, when for example a salesman has to know what kind of industry the customer is in before he can enter a customer on the database, so they fill anything in – usually the default value – at least with a blank field you know they don’t know.) [[Sjir3: every country has currently a political capital.]]
MW3: It was actually quite easy to find a list of capital cities, but there are similar examples where that is not so. Let us suppose we are talking about a customer, you might require that you must have a telephone number for all customers, but if a customer writes to you by email and does not provide a telephone number in that email, that means you cannot enter them into the customer database and fulfill their order (unless you make a telephone number up).
Regards
Matthew West
Information Junction
Tel: +44 1489 880185
Mobile: +44 750 3385279
Skype: dr.matthew.west
matthew.west@xxxxxxxxxxxxxxxxxxxxxxxxx
http://www.informationjunction.co.uk/
http://www.matthew-west.org.uk/
This email originates from Information Junction Ltd. Registered in England and Wales No. 6632177.
Registered office: 2 Brookside, Meadow Way, Letchworth Garden City, Hertfordshire, SG6 3JE.