ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] UML and Semantics

To: "[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "David C. Hay" <dch@xxxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 04 Dec 2012 11:41:21 -0600
Message-id: <7.0.0.16.2.20121204104819.03ec1cb0@xxxxxxxxxxxxxxxxxxxxxxx>
Ed, thanks:

At  12/4/2012 12:30 AM, you wrote:
I'm with William Frank on the distinction between what the UML semantics IS and what vendors and most users think UML IS FOR.  The primary use of UML is to design O-O solutions, and to a lesser extent, to document XML schemas.  But as William pointed out, the primary use of Chen's E-R is/was to design relational DBs.  And the only use of an E-R language like IDEF1-X is to do so.  If you really want a semantic modeling language, you might look at something like NIAM/ORM, but even then the primary use was to design databases.  The difference is that NIAM/ORM does not force solution paradigms, like attributes vs. relationships, or strange handling of multivalued attributes, on the model itself.

I agree that most E/R modeling is done to support relational database design.  IDEF1X can only be used for that.  Presenting an IDEF1x model to non-technical domain experts, as far as I know, cannot be done. Information engineering is better, but Finklestein and Martin never addressed identifying relationships.  So, when ERwin decided to add IE to its repertoire, it borrowed the notation from IDEF1x, considerably biasing and complexifying the notation. And, while you don't have to show them, foreign keys are a big part of the IE approach.

My preference is the notation created by Richard Barker and Harry Ellis.  It is more spare than any of the others, and cardinality is graphic, not in text.  Most importantly, they came up with an approach to naming relationships that very precisely captures the semantics of the model.  That approach can be used in any notation, but you have to be willing to do so.

(Ironically, that approach is the basis for RDF: <subject class> <predicate> <object class>.
The only problem with that is that the designers of RDF didn't realize the implications of that approach and botched the predicates they defined for the underlying language,  Instead of saying "Charles rdf:isExampleOf  Person", you say "Charles rdf:type Person".  Charles is a type of person?  I don't think so.)


As William pointed out, both E-R apostles and later O-O apostles insisted that their modeling approach was a natural means of capturing the concepts of the domain, from which a solution model could be derived more or less by rote.  The consequence was that the concepts of the domain were forced into the modeling structures that produced the desired solution.  How many E-R modeling guidelines will tell you whether a vehicle model (like BMW 328i) should be a class/entity to which an individual vehicle is related or an attribute of the individual vehicle that is represented by a String?  The conceptual model is that the vehicle model is a class/entity in its own right -- the separate question is how you choose to represent that relationship in a computational solution.

I agree that modeling guidelines are poor.  That's the flag I am carrying.  With limited effect, I fear.


UML 2.x definitely tried to define its semantics to apply to arbitrary things, not just data records with attached operations.  But the tool builders continued to emphasize, and extend, the interpretations and supporting tools that fit their primary market -- the software designers and implementors.

My point exactly.


1.  In UML, there is no real constraint on what constitutes a 'class', but David seems to imply that there is some clear constraint on what constitutes an 'entity'.  The point in UML is that a class can be 'elephant' and refer to the real mammals conceptually, and an implementation that manages a zoo can recognize and describe in UML the relationship between the software 'class' and the mammal 'class'.  The idea that a class can be mammals or software objects is not, IMO, a weakness.  It means you have to specify the intent of your model.  And in this, William and I concur.

I think I agree with this.  The zoo example is problematic.  The point I was making is that if you are describing a zoo, you don't want to describe the kinds of technology you might choose to keep track of the animals.

My point is that an E/R model should be concerned with a particular domain, and the entity types/classes included should only be about the domain, not about technology.


2.  It is absolutely true that UML still has a lot of O-O programming baggage, like visibility, that has not been sorted out into "stereotypes" and "profiles" and other such dialect mechanisms, precisely because of the perceived target market.  No rose is without thorns.  You don't have to use, or show, visibility, or static or default values, or any other such Java/C++ isms.  (You may have to work to convince the UML tool that you don't want to see that junk.)

The first month I spent on this exercise was figuring out how to turn off the things in the MagicDraw tool that I wasn't interested in. . .


3.  A UML association IS a "relationship" in exactly the same sense that an ER "relationship" is.  The problem is that, like an ER relationship, it is two or three views of the same conceptual relationship.  The UML 'navigation' stuff is conceptually about which of those views is intended.  (Of course, 90% of UML modelers think it is about whether there is a pointer-to-X member of the object structure represented by the related classes.

No, I must disagree.  You correctly point out that the E/R model has a relational bias.  Most of that should be dispensed with i a conceptual model, but E/R modeling does inherit the underlying fact that it is about how things are related to each other.  If you have asserted that A is related to B, that is a complete sentence. You cannot separately have A is related to <something> and <something> related to B.  The E/R approach is based on relational theory being fundamentally declarative. The UML approach is based on OO programming having to deal with "namespaces".

The great insight of object-oriented programming was that it is much more powerful for being organized around the objects being manipulated than around the processes.  But, (much to my surprise when I discovered this), many object-oriented programmers remain programmers.  Even the definition of class is that it is a piece program code.

But then, most E-R modelers think a relationship involves explicit foreign key columns, too.) 

I know, I know.  This is what puts me at odds with my colleagues in this are as well.

If the relationship is navigable FROM either or both of the classes, there is a "property" of things in that class that refers to the corresponding instances of the other class.

But the problem is that that "property" does not recognize the existence of the other class.  It is dependent upon the "roleName" to label it.  Thus "roleName" is not a predicate.  This is different--big time.

 In formal logic terms (not far from database science terms), there is a function that maps members of that class to members of the related class.  If neither end of the relationship is navigable, it is a conceptual relationship imposed on the two classes from outside, and th
 ere is no such property.  In formal logic terms, there is only a mapping from pairs to truth values. 

UML is in this sense more expressive than E-R languages that only name the relationship (or is that the table?) and provide only one "reading" in Nijssen's terminology.

How is it more expressive?  UML does allow naming in both directions, just as my version of E/R does.  But the name involved is just a roleName, not a predicate. 


In addition, UML has the correct idea that a relationship/association can specialize or imply another relationship, and there are several distinguishable behaviors under that general idea that UML captures using generalization, redefinition and subsetting of properties.  (These concepts also exist in NIAM/ORM.  I'm not sure whether David's unidentified favorite E-R language has them.)  And these are far stronger as semantic concepts than as implementation concepts -- many implementation languages cannot really support them.

The notion that relationships can have other relationships as sub-types is something I discovered I needed in a previous book.  I was distressed that neither my notation nor any other E/R notation that I found covered it.  I was excited that I had discovered something new!

Imagine my disappointment when I discovered that UML (UML, of all things!) covered that exact concept.  And now, so does the semantic web.  So, yes, I took advantage of that in my books.


I don't really want to argue for the value of the soft and hard containment notions that can be added to UML relationships ("aggregation" and "composition"), for several reasons, but some conceptual modelers find them useful, by making clear the intended axiomatic interpretations.

If your syntax allows you to assert that each Order may be composed of one or more Line Items, then you don't need any other symbols for that. But plain vanilla UML does not permit that.  All you can say is that Order has a property that is labeled order line (or some such). There is another class hidden back there, but it's not part of the property.  Since composition is a common (and pretty important) kind of relationship, the UML folks decided it needed its own symbol.  Unfortunately, there is no symbol for member of, player of, or any of the other infinite number of possible relationship predicates. 

One interesting thing about the pair of symbols for composition is that they do address 2 / 3 of the relational problem of referential integrity.  "Composition" says that you cannot delete the parent in a relationship.  "Aggregation" says that if you do, the children are left as orphans.  There is no symbol for "referential delete"--if you delete the parent, all the children are also deleted.
 

I think David is mostly right that UML in toto -- the 17 sub-languages -- do borrow from finite state machine diagrams, Petri nets, and other representations of process, but surely that aspect is out of scope in comparing support for domain concepts that is comparable to E-R.

David has been arguing for 15 years that UML cannot be used to do E-R modeling, while many of us, including Jim Rumbaugh,Jim Odell, and apparently William Frank, not to mention Ed Barkmeyer, have been doing just that for all or most of that same time period.  Admittedly, we had to ignore what UML v1.4 said some of the constructs meant, but that has not been true since the advent of UML v2.  Now, if David has a particular well-defined E-R modeling language in mind, perhaps we can compare it with UML, which has an ISO standard definition.

I was startled to learn from one of the reviewers of my Enterprise Model Patterns book that I have created something called a "Domain Specific Language".  I did not know that. I didn't define stereotypes for "entity type" and "predicate", because one of the elements I require in creating my kind of E/R model is that the diagram not be cluttered with things that don't contribute to its meaning.  But I do assert that if you are looking at a "highly abstract yellow" (I use yellow to highlight the latest additions in a model build-up sequence) or "HAY" model, these are the things you are looking at.  But yes, I apply my definitions of terms consistently.

So apparently it is legal UML after all!


-Ed

P.S.  I don't think UML is a better concept modeling language than NIAM/ORM or OWL, but it is rather better supported by available tooling, and before I grow too old to care, it may finally be integrated with OCL.  The biggest weakness of NIAM/ORM has always been the absence of tools and a standard exchange form, and the biggest weakness of OWL is the absence of standard diagrammatic form for classes and properties.  OMG created one, as a UML dialect ("profile"), but it has little support; other OMGisms got in the way.  In conceptual modeling a picture really is worth 1000 words.

I agree.   Thanks for this, Ed.



--
Edward J. Barkmeyer                       Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Engineering Laboratory -- Systems Integration Division
100 Bureau Drive, Stop 8263               Office: +1 301-975-3528
Gaithersburg, MD 20899-8263               Mobile: +1 240-672-5800
________________________________________
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of David C. Hay [dch@xxxxxxxxxxxxxxxxxxxxxxx]
Sent: Monday, December 03, 2012 3:40 PM
To: [ontolog-forum]
Subject: [ontolog-forum]  UML and Semantics

At  12/2/2012 11:57 PM, John Sowa wrote:
KI
> entity relationship model semantics can exist in self-describing structured data.

Yes.  E-R diagrams were introduced in 1976.  Even earlier, there were
Bachman diagrams, type hierarchies, and Petri nets.  Versions of all
those diagrams were combined in UML,

No.  UML was developed specifically to support object-oriented program design, and had only minimal relationship to the earlier E/R notations.

and mainstream programmers used
them to specify programs and databases. They developed tools to draw
the diagrams and map them to and from the software.

Just two kinds of UML diagrams provide 99% of the useful subset of OWL:
type hierarchies and E-R diagrams.  Controlled English could be added
as a very readable supplement or extension.  If the SW had adopted that
as the official strategy, mainstream IT would have a smooth migration
path to ontology-based tools.
I realize that "UML" has captured the imaginations of a lot of data modelers.  The problem is that the UML class diagram, as originally designed, doesn't address semantics at all.  It is not a "useful subset of OWL type hierarchies and E-R diagrams".  It is something quite different.

In the OO world it came from, a "class" simply refers to a piece of program code that describes something to be manipulated by the computer.  Only accidentally might a class refer to a collection of things in the world.  As a consequence, there are several important implications:

1. There are no constraints as to what constitutes a "class".  It might be a class of things in the world ("person", "project", etc.), or a class of things to be manipulated by the computer ("screen", "interface", etc.)  In semantic modeling, we are concerned only with classes of things of significance to an organization. John previously wrote a compelling essay on the conflict between calling these "entity classes" or "entity types", but I don't want to go into that here.  The point is, they are not OO classes.

2. There are all kinds of bits of notation that apply only to the programming world.  ("visibility", etc.)

3. (And this is most important), an "association" is not about structure.  In the E/R world, a Relationship is about two (entity) classes.  It represents the structure of something in the world, as expressed in a sentence of the form: <subject entity class> <predicate> <object entity class>.  Thus, a relationship name represents a predicate in a semantic assertion.   In UML, an association is a navigation path for software to transverse.  This means that  a "role name" is not a predicate at all.  It is simply a label for the 2d class.  (You see, even though both an attribute and an association are "properties" of the first class, the 2d class at the other end of the association is not permitted to be part of that property.  (It's about something called a "namespace".  Don't ask.))

I spent some time hanging out with the Object Management Group people and was inundated with the propaganda that "Oh, thanks to "stereotypes", you can get UML to cover anything".  It took a long time for me to get my head around point 3, above, but finally decided to address the problem.

So, in my latest patterns book, Enterprise Model Patterns: Describing the World, I decided to use UML instead of my favorite notation.  By addressing the premises I just mentioned, it turns out I could use a modified form of the UML class notation to create semantic models.

(Among other things, I had to add the Barker/Ellis standard for naming relationships to incorporate semantics.  Thus, UML "roleNames" became E-R predicates.)

It worked!  I would submit that that book is a candidate for a "high-level" ontology.

Because I had now convinced my E/R buddies that I had "gone over to the dark side" by using UML--as well as annoying my UML buddies for having completely bastardized their notation--I wrote a companion, UML and Data Modeling: A Reconciliation.  This book specifically addresses the issues of trying to describe semantics with a notation that was not originally designed for it.

I welcome any comments and responses.

Dave Hay
 
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
 

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>