ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Plural taxonomies?

To: doug@xxxxxxxxxx, "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Mike Bennett <mbennett@xxxxxxxxxxxxxxx>
Date: Fri, 11 Jun 2010 16:29:57 +0100
Message-id: <4C125675.4070006@xxxxxxxxxxxxxxx>
Hi Doug,    (01)

You raise some intersting points, sorry it took me a while to try and do 
this justice...    (02)

doug foxvog wrote:
> On Mon, May 31, 2010 15:42, Mike Bennett said:
>   
>> ... We are
>> developing a formal semantic model of terms in the financial services
>> industry, and to do this we are using the underlying concepts of OWL but
>> restating these in English. In OWL there are classes (with the
>> super-class of owl:Thing), and two types or properties, Object
>> Properties and Datatype Properties. We refer to the OWL Classes as
>> "Things" and properties as "Facts" namely "Relationship Facts" and
>> "Simple Facts" respectively.
>>     
>
> Could you clarify?  Since OWL Classes are a sub-class of Thing, shouldn't
> the instances of the OWL Classes be "Things"?  Likewise, shouldn't "Facts"
> be the assignment of Properties to Things?
>   
Using the ODM spec (in an earlier draft than the current one) we have 
created a UML model in which certain defined UML base classes are used 
to represent certain OWL constructs, identified by suitable UML 
stereotypes. Of these, a UML base class of "Class" is used for 
owl:class. This has the stereotype of owlClass. However, we don't use 
the same words externally as we do internally, since the model is 
intended (and used) for presentation of model content to business 
subject matter experts. So it's called a class internally but not 
described in those terms.    (03)

Meanwhile as you say, all the OWL Classes in the model are ultimately 
sub-types of the library object "Thing", itself an OWL Class.    (04)

We considered using the more accurate term "Entity", as suggested also 
by John a while back. However, Entity, like Class, already comes with 
its own semantic baggage: a lot of business SMEs have over the years 
used both class models and more frequently entity relationship diagrams, 
to specify what they fondly believe is a business view of the world, 
using a language that the technies understand. In my view, this is poor 
management of the "language interface" that should exist between 
business conceptual models and design models, but the reality is we live 
with the consequences of poor IT management, so Entity as a word was not 
in a good place. "Thing" was the only unharmed word I could find, and 
even if OWL models did not have Thing at the top of the taxonomic tree, 
it's the word I would have used to describe the world of "Things and 
facts".    (05)

Similarly, I would not want to expose words like Object Property or 
Datatype Property to business SMEs. It is futile to expect to educate 
business domain experts in a new language in order to get their 
participation, and dangerous to assume they have learnt you favourite 
new language. To get the maximum confidence that a draft model presented 
to SMEs for approval has been understood and validated (or updated) 
correctly, you need to know that all business folks have the same 
understanding of the same terms. Hence the need to use only terms that 
have no history and no part in some formal language. Sadly the English 
language is running out of such words.    (06)

So when I describe the framework in terms of "Things and Facts" I am not 
presenting that explanation to an OWL-savvy audience of ontologists and 
I am not asking them to interpret them in the light of what they know 
about ontology. I am presenting it as an explanation of how when you 
look at the diagrams and the spreadsheets from the model, what you will 
see are things and facts. Then I go on to explain how Things are 
essentially set theory constructs, and how the facts about them can be 
relationship facts (relating one thing to another thing) or simple facts 
where the fact is stated in terms of simple stuff like text and dates 
and so on. Meanwhile for the ontologist it should be apparent that 
relationship facts are Object Properties and that simple facts are 
Object Properties. And of course that facts are properties.    (07)

There are almost no instances (owl:individual) in the model, except 
where specific instances have to be identified for modeling reasons e.g. 
the USA as an instance of a Country, ISIN as an instance of security 
identifier and so on. I have a feeling that many folks who work more 
extensively with OWL models that have both class and individual data 
would tend to refer to individuals as "things", maybe I'm wrong about 
that. But that's not the language I use or am using here. As it happens 
I refer to individuals as "Individuals", one OWL term I have not felt a 
need to re-cast for presentation to SMEs. Is this what you mean by 
"thing" in your message? But here we are talking about the words we use 
not the concepts in the model.    (08)

I should add that there are never going to be large numbers of 
Individuals in the EDM Council model, since is not intended to become a 
repository of actual securities data, the volume of securities data out 
there is too vast for that to even be thinkable. Rather, it is intended 
as the thus-far missing business conceptual model against which various 
logical data models designs may be built, or may be referenced to after 
the event. At present the semantics underlying most logical data model 
designs in the industry are in the designer's head or on informally 
structured spreadsheets, so this is an attempt to improve upon that 
broken process using semantic technology. I believe we can do more with 
it, but that's the original use case. So if I seem a little ignorant of 
the terminology of linked data and of OWL models with individual data, 
that's because I don't really need to engage with it for this project.    (09)

So instances of OWL classes, where they exist, are "Individuals", just 
like they are in OWL. Every class in the model is a class of "Thing", 
that is it is asserted to represent a real thing and not a data 
construct. Any data model developed from this would have logical classes 
of data, and of course instances of those data model classes would be 
instances of data. Not Things.    (010)

>   
>> These are modeled in a UML modeling tool
>> from which we produce both diagrams and spreadsheets, for review via a
>> website (this is at www.hypercube.co.uk/edmcouncil ).
>>     
>
>   
>> Anyway, each "Thing" and each "Fact" has a label which is a simple
>> textual name for that term, using whatever term business domain experts
>> are most comfortable with.
>>     
>
> I'm pleased that you distinguish labels from names at this point.
> However, below you discuss "names", not labels.
>   
Loose terminology on my part, I'm afraid, but what is a name if it is 
not some kind of label? Some of the ideas you bring up below are things 
I should probably have thought of sooner, and will try and implement at 
some point in the future. Meanwhile, when I am describing what is there 
now, I should clarify that what is there now is a UML model, in which 
every UML class, UML association and attribute has a name, identified in 
the UML tool as "Name". This name is a label for the class, association 
or attribute, in the same way that any name is a label for the thing it 
is a name of.    (011)

UML also generates UIDs, but I have not made reference to these.    (012)

One decision I made when starting this model, was not to create a 
separate tagged value in which to create and hand-edit a URI. My 
thinking was that at some point, we would want to find a way of 
converting the model content into OWL, and that I would far rather that 
the OWL URIs are assigned as part of that process, rather than being 
maintained by hand with the errors that would introduce.    (013)

Something that might have been worth doing, and which I would discuss as 
part of any future tightening up of this repository, is to have a new 
tagged value for a formal unique label, perhaps structured in accordance 
with ISO 11179 Naming and Design Rules. I did think about using ISO 
11179 NDR for the formal names in the UML model, but I decided against 
this on the basis that business subject matter experts would struggle 
with this, and it would re-introduce something techie-looking. My 
thinking is that once something looks even remotely techie, business 
experts in the financial industry tend to glaze over and assume that the 
techies mut have got it right (and it will be the techies' fault if it 
isn't). This is not good if you want to take the more industrial 
approach, as I do, that all technical artefacts must trace back to some 
business conceptual model that is understood, reviewed and signed off by 
business as being correct. This is a bit of a cultural shift for the 
financial securities industry, which is why we often make mistakes that 
cost millions of dollars (the counter-example I usually give is the oil 
industry but I guess now's not a good time to go there!). So that's why 
I didn't use NDR. The same think applies to anything in camel case, and 
anything with punctuation marks you would not find in a novel or a 
newspaper.    (014)

So a possible future update to this model would be to add an ISO 11179 
or other unique label which remains the same no matter what label is 
presented to the user. That could also be in camel case, making the 
transformation to an owl model a lot simpler since one would not have to 
try and collapse the white spaces that inevitably exist in natural 
language labels such as our current names.    (015)

Of course in UML tools the only label that can be presented to the user 
as the "name" of the thing is the UML "Name" label since this sits at 
the top of the box or (for attributes and associations) is the only 
visible text on diagrams. To do any of the other things we are 
discussing would require a tool which is more than a UML modeling tool.    (016)

> A solution to the problems you discuss may lie in distinguishing an
> internal name for a term (which is used by programs for access and
> processing) from labels which identify the terms to users.  In other
> areas of computer science, the users of programs have no idea of what
> internal names are used in programs and could care less.  Why should
> semantic programs make internal names visible to users?
>   
As noted above, this is a very good idea which I should possibly have 
thought of at the outset.    (017)

At some future point, we hope to put this model content forward as a 
possible semantic layer for the financial industry messaging standard 
ISO20022. In order to do this, a lot of the things where I have taken a 
pragmatic approach will need to be re-done, and of course both OWL and 
ODM have been considerably updated since I created the framework for 
this model. The UML tool I use (Enterprise Architect) does not seem to 
handle changes made to stereotypes on the fly, i.e. if a add a new 
tagged value (such as for "Internal Label"), it is not propagated to 
existing terms that use that stereotype. I doubt any UML tool does this 
but it would be good if they did.    (018)

So at some point the whole thing will need to rebuilt. My agenda over 
the next couple of years (including in here and at SemTech) is to 
identify all the things that should be added or changed, so that it can 
be rebuilt right, once. Meanwhile we build in the existing format, in 
which every term has a UML "Name" label and any number of synonyms, but 
no immutable internal label (except the UML GUID of course). I wish I'd 
thought of it, but it's not preventing me doing anything at present.    (019)

>   
>> Other words with precisely the same meaning
>> are identified as "Synonym" using a tag set up for that purpose. For
>> instance the other day I renamed "MBS Issue" to "MBS Deal" since I
>> learnt that's what they call it most often, and put the previous name
>> into the "Synonym" tag.
>>     
>
> If EnglishWords and Phrases had denotations as terms and terms had
> preferred phrases (which could vary by context), then renaming would
> not be an issue. Internally, an OWL Class named MortgageBackedSecurityIssue
> could have have "MBS Issue" as a preferred denotation.  In the given
> case, it would also become a denotation "MBS Deal", which (in the new
> context) would become its preferred phrase.
>   
Indeed. What would be interesting would be if there were some way of 
associating different denotations with different contexts. In fact, 
since the model is partitioned according to the "Independent / Relative 
/ Mediating" top level partition set, the terms for the actual contexts 
can be defined under "Mediating", and some already are. If a tool (again 
not this UML tool) could    (020)

> A company in Saginaw, Michigan, could use the same ontology, adding
> a Class named MBSAirportBondIssue, giving that as a denotation of "MBS
> Issue", which might be its preferred phrase.
>   
Yes. A lot of people, indeed most people, seem to use terms very locally 
and often struggle with the idea that their meaning for a word is not 
the only one in the universe. Hence the use of a semantic model at all, 
if words were sufficient we would not need to do any of this.    (021)

I think if one were to create a framework similar to the EDM Council SR, 
but which people could extend and edit locally, then this sort of local 
labelling would be a must-have feature. Again, this is not offered by 
the UML tool.    (022)

> How to deal with conflicting inputs would be up to the API, which could
> have context-related rules.
>
>   
>> You are right that context is needed to deal with meaning. One can
>> either come up with contextual display arrangements such as hover help,
>> or use semantic modelling, such as OWL, which implements (albeit
>> imperfectly) the fundamentals of logic. Using a semantic notation then
>> allows us to formally define what a term means, both by its position in
>> a taxonomy (so Linnaeus' Taxonomy of Species tells us what kind of thing
>> a lynx is), and by the logical statement of facts about those things.
>> The facts are what distinguishes an ontology from a taxonomy, in most
>> accepted definitions of those two words.
>>     
>
>
>
>   
>> The difficult bit is keeping that definitional rigour and yet presenting
>> the information in ways that subject matter experts can understand.
>>     
>
> Once you separate the internal name from the phrase presented to the
> user, this becomes easier.  Hover help can certainly assist.  As could
> contextual choice of phrase to produce.
>
>   
It sounds like we are putting together the outline specification for a 
tool. At present I have to work within the limitations of a tool 
configured to do something different. I wonder if anyone would develop 
such a tool? There is a lot I would add to its functionality.    (023)


>> The
>> logic is every bit as complex as any programming concepts, but has no
>> relation to software development concepts, so it takes a while to
>> communicate this to the subject matter experts, in my experience. Also
>> some business folks are more comfortable looking at spreadsheets whereas
>> others are better looking at diagrams - this is a difference between
>> different people in any walk of life. Hence we represent all the same
>> information in both formats. It still isn't easy, but it means that for
>> every term on the diagram or in the spreadsheet, there are enough
>> qualifying terms around it to precisely disambiguate it from any terms
>> that might have the same or a similar name (heteronyms). It should be
>> possible to take any one term and rename it "banana" and still identify
>> what is meant by banana in that context, if it's modeled right.
>>     
>
> I question this.  There is a distinction between providing enough infor-
> mation to distinguish homonyms and having enough information to precisely
> define a term.
>   
True.  There are a couple of interesting questions around this. On the 
one hand, you have the pragmatic consideration that it is never 
practical to model every fact about a given kind of thing, for a given 
application or requirement. So there is the basic decision taken by any 
ontologist about what facts are germane to the application they are 
developing. This is the well known ontological commitment, and is a 
consideration for any developer whether the ontology for their 
application is ever modeled using an ontology notation, or is just kept 
in their head and reflected in the logical data model design. We all 
make ontological commitments for a given application. This is made more 
complicated for a standards initiative like this one, because we need to 
identify all the facts that might be relevant to any application or 
securities processing context (investment, risk management, securities 
processing, risk management, compliance, and of course systemic risk at 
trans-national level, a new requirement).    (024)

So you don't need to model the entire DNA sequence of a duck to model 
something as being a duck.    (025)

That's the easy bit and I've said nothing new here.    (026)

Another aspect of this is that in theory, for each new kind of "Thing", 
I need only identify one fact, one facet about that thing, that 
distinguishes it from other things. So I have a class of things called 
Bond, and I create a set of sub-classes called Municipal bond, Sovereign 
Bond and Corporate Bond. It works well in that example - the single, 
defining feature of a Municipal Bond is that it is issued by a 
municipality; the three are distinguished around the facet of "issued 
by", which itself is a fact about the parent class, narrowed in the 
child classes.    (027)

There are however a number of facts that only apply to bonds issued by 
municipalities. So those are necessary facts versus incidental facts.    (028)

However, aside from that simple example, in some places in the model 
there may be so many unique facts about a given kind of thing that it 
becomes difficult to say which are necessary, and impossible to reach 
consensus on that question. We could, in theory, introduce a new level 
of taxonomic hieararchy for every one fact. However, a typical set of 
securities terms has a thousand or more unique terms, so it would be 
impractical to branch the hierarchy for every new fact or facet.    (029)

Instead what we have done, based on existing industry standards, is to 
take the hierarchy of classifications that already exists in the 
industry, and put in all the facts for the different kinds of thing that 
are already defined. Sometimes we have had to add intermediate classes 
that nobody uses (for example, the kind of thing of which Mortgage 
Backed Security and Asset Backed Security are both a type).    (030)

There was one real difficulty with the pre-existing terms which is that 
the most detailed standard (the ISO 20022 FIBIM or Securities Data 
Model) defines most properties at the highest level at which they might 
apply, and makes everything optional. This means that the business 
knowledge which led, for example, to "Redemption" being a property of 
Security and not of Debt Security has been lost. It is precisely this 
loss of business knowledge that we are trying to address. So where 
possible I have moved these optional terms down the taxonomy to the 
actual things about which they are a fact, but this has not always been 
possible.    (031)

Also, the stated aim of the model is to have all the terms you would 
find in a data model, not just the ones that prove definitive of a given 
kind of thing. This is so each term can have one agreed 
industry-reviewed written definition. The terms are essentially our end, 
not the means to an end in this case. So we know in advance what terms 
we need to define, since they are the terms people need to communicate 
about when exchanging information about securities. But that's a scoping 
question not a semantics question.    (032)

In practice I have not tried to distinguish necessary facts from 
incidental facts. Would that be a useful distinction for the future? I 
think it might be. Also as I may have noted elsewhere, identifying the 
different facets by which different sets of sub-sets are defined would 
be useful.    (033)

> Using the distinction i raised at the beginning, you are referring to
> relabeling a single term in user output, not internally renaming it.  The
> internal name need have no bearing on the output text.
>
>   
Correct. At present the only internal name is the UML GUID. I don't 
think most of the improvements we are discussing can be achieved within 
an existing UML tool, but it would be nice if they could be persuaded to 
develop the tool in these directions. Or, maybe we have a specification 
for a product on our hands?    (034)

Cheers,    (035)

Mike
>> They say "meaning is context" and that's sort of true in a trivial way,
>> but all the context should be definable in an ontology if it's set up
>> right.
>>     
>
> Agreed.
>
> -- doug foxvog
>
>   
>> I hope that is a bit clearer.
>>
>>
>> Mike
>>
>> David Eddy wrote:
>>     
>>> Mike -
>>>
>>> On May 31, 2010, at 1:30 PM, Mike Bennett wrote:
>>>
>>>
>>>       
>>>> We don't rely on words for meanings, and I see no reason why anyone
>>>> would. Terms are either Things or Facts, and each of these has a label
>>>> which happens to be whichever word business domain experts are most
>>>> comfortable with, and any number of synonyms which are other words
>>>> with
>>>> the same meaning.
>>>>
>>>>         
>>> This looses me.
>>>
>>> Best as I've experienced, humans tend to be strongly attached to
>>> terms/words/phrases having meanings.  I am NOT in favor of using
>>> numbers to represent meaning to humans.
>>>
>>> Naturally a huge issue here is that I see a word, recognize it & am
>>> comfortable with the implicit meaning.  You see the same word, which
>>> evokes a different meaning (say "Table" in context of running a
>>> meeting, not furniture, in American English & UK English).  We're
>>> both comfortable with what we assume to be the meaning, but one of us
>>> is wrong.
>>>
>>> What I want to see is a term/word/phrase/acronym plus various
>>> available CONTEXTUAL meanings.  In a document where there are
>>> potentially ambiguous terms, there could be "footnotes," tags, or
>>> "hovering help" expressing explicit meaning.
>>>
>>>
>>>
>>> What do you mean by "Terms are either Things or Facts"?
>>>
>>> ___________________
>>> David Eddy
>>> deddy@xxxxxxxxxxxxx
>>>
>>> 781-455-0949
>>>       
> begin_of_the_skype_highlighting              781-455-0949      
>end_of_the_skype_highlighting
>   
>>> _________________________________________________________________
>>> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
>>> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
>>> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
>>> Shared Files: http://ontolog.cim3.net/file/
>>> Community Wiki: http://ontolog.cim3.net/wiki/
>>> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>>> To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>>>
>>>
>>>
>>>
>>>       
>> --
>> Mike Bennett
>> Director
>> Hypercube Ltd.
>> 89 Worship Street
>> London EC2A 2BF
>> Tel: +44 (0) 20 7917 9522
>> Mob: +44 (0) 7721 420 730
>> www.hypercube.co.uk
>> Registered in England and Wales No. 2461068
>>
>>
>> _________________________________________________________________
>> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
>> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
>> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
>> Shared Files: http://ontolog.cim3.net/file/
>> Community Wiki: http://ontolog.cim3.net/wiki/
>> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>> To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>>
>>
>>     
>
>
> =============================================================
> doug foxvog    doug@xxxxxxxxxx   http://ProgressiveAustin.org
>
> "I speak as an American to the leaders of my own nation. The great
> initiative in this war is ours. The initiative to stop it must be ours."
>     - Dr. Martin Luther King Jr.
> =============================================================
>
>  
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/ 
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
> To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>  
>
>
>       (036)


-- 
Mike Bennett
Director
Hypercube Ltd. 
89 Worship Street
London EC2A 2BF
Tel: +44 (0) 20 7917 9522
Mob: +44 (0) 7721 420 730
www.hypercube.co.uk
Registered in England and Wales No. 2461068    (037)


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (038)

<Prev in Thread] Current Thread [Next in Thread>