[Top] [All Lists]

Re: [ontolog-forum] Language vs Logic

To: <edbark@xxxxxxxx>, "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Rich Cooper" <rich@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 14 Sep 2010 10:43:25 -0700
Message-id: <20100914174332.5467C138CF7@xxxxxxxxxxxxxxxxx>

Hi Ed, Dave, Jim, David, et al, comments below,




Rich Cooper


Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2


-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Ed Barkmeyer
Sent: Tuesday, September 14, 2010 8:41 AM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Language vs Logic



Jim Rhyne wrote:

> This is OK as long as you realize that data integrity and data semantics are

> contained in the applications, that you understand these legacy systems well

> enough to be sure you understand the data semantics and that you can

> reproduce them without error. Legacy databases are often full of codes that

> are meaningless except when interpreted by the applications.



Strongly agree.  Reverse engineering a "legacy" (read: existing/useful)

database can be an intensely manual process.  Analysis of the

application code can tell you what a data element is used for and how it

is used/interpreted.  The database schema itself can only give you a

name, a key set, and a datatype.  OK, SQL2 allows you to add a lot of

rules about data element relationships, and presumably the ones that are

actually written in the schema have some conceptual basis.


Personally, I have found that most AsIs DBs are useful histories of how people reacted to the expressed interfaces.  The code, which is supposed to interpret the fields, is often not consistent with the way people used the database.  


Also, the reason a new app is being built is normally (not always) because the old app is out of date, and upgrading the code itself is too expensive, so a new ToBe app is a justified development expense.  


But let's distinguish between the executable code and the stored data - the data is how people reacted to the expressed interface, while the code is how the programmers and maintenance team reacted to the complaints of functional inappropriateness.  The code is nearly never worth keeping, though you might get a few juicy routines out of thousands to millions of lines of code.  The stored data can be informative, but it probably can't be used intact in the ToBe system.  The data, on the other hand, is a useful trace of requirements information showing how the system was actually perceived by the users.  That, IMHO, is its major contribution to the ToBe system.  


Reverse engineering a database is the process of converting a data

structure model back into the concept model that it implements.  And the

problem is that the "forward engineering" mapping is not one to one from

modeling _language_ to implementation _language_.  It is many-to-one,

which means that a simple inversion rule is wrong much of the time, and

the total effect of the simple rules on an interesting database schema

is always to produce nonsense.  Application analysis has the advantage

of context in each element interpretation; database schema analysis is

exceedingly limited in that regard.


Agreed.  Also, the sheer volume of data, especially when informed by the timeline of data entry, can help explain the performance requirements to the new development team in a way that can be translated into performance requirements for the ToBe system.  But the data itself is not useful in the new system nearly any of the time.  


That said, other contextual knowledge can be brought to bear.  If, for

example, you know that the database design followed some "information

analysis method" and the database schema was then "generated" (even if

by hand) according to the principles of that method, then you may be

able to recognize "entity" tables and "relationship" tables and

"attributes" and "dependencies" and "code" data types and "date" data

types, and so on.  And if the schema rules are also written according to

the methodology, they can help.  But a different analysis/design method

may beget very similar structures with a different set of conventions

for naming and relationship representation.  For example, is a foreign

key attribute in an entity table an existential dependency or just a

"functional" (1..1 or 0..1) relationship?  And is a subclass represented

by a separate table, or by a code attribute (type name) if it has no

local attributes (as distinct from local relationships)?  And there is

always the question of what "null permitted" means.  (Remember Ted

Codd's "kinds of nothing"?)


So, if you know the design method and believe it was used consistently

and faithfully, you can code a reverse mapping that is complex but

fairly reliable, but you still have to have human engineers looking over

every detail and repairing the weird things. 


Agreed, with emphasis.


Further, the human

engineers must be familiar with the application domain, and have access

to the business experts and some of the software engineers. 


Those engineers, business managers, and even users are seldom the same as the ones who built the AsIs system, which was likely done years before.  The turnover in SWE is very high now compared to years ago, so expertise with the old system is often very hard to find.  


All of this

translates to a full-blown software engineering project with some

assistance from software analysis tools.  I'm not sure how much easier

that is than using software analysis tools on the applications, and in

this day and age, there is no reason not to use both sets of tools,

especially if the providers have tool sets that work together.




P.S.  OMG has a whole gang of software analysis tool vendors making

standards for interchange of the analytical results, because none of

them alone ever has the right set of capabilities for any major

customer.  They are the Software Assurance Group and the

"Architecture-Driven Modernization" Task Force.  (The latter is ADM, a

play on the OMG "MDA" engineering approach, because they do _reverse_



Since I am not familiar with the very latest commercial tools for the latest DB and SWE representations, you may be right.  But the tools are mostly available for development, not for analysis of code.  With even a few years between the AsIs and ToBe developments, the underlying software technology changes so fast that the old methods are discarded in most new ToBe systems for commercial purposes.  





Edward J. Barkmeyer                        Email: edbark@xxxxxxxx

National Institute of Standards & Technology

Manufacturing Systems Integration Division

100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528

Gaithersburg, MD 20899-8263                FAX: +1 301-975-4694


"The opinions expressed above do not reflect consensus of NIST,

 and have not been reviewed by any Government authority."




> Jim Rhyne

> Software Renovation Consulting

> Los Gatos, California

> http://www.enterprisesoftwarerenovation.com/




> -----Original Message-----

> From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx

> [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Rich Cooper

> Sent: Sunday, September 12, 2010 2:04 PM

> To: '[ontolog-forum] '

> Subject: Re: [ontolog-forum] Language vs Logic


> Hi David,


> You are right-on with a realistic view of how this will progress, IMHO.


> But instead of reverse engineering legacy systems, consider projects to

> reverse engineer legacy DATABASEs.  That is a whole lot more effective and

> way less expensive.  Also, it happens to be one subject discussed in my

> patent at http://www.englishlogickernel.com/Patent-7-209-923-B1.PDF, which I

> mentioned earlier.


> By reverse engineering the database, you can still use whatever remains

> useful of the old data model, the AsIs version.  It helps define what the

> users were actually typing into those fields, just in case the new design

> team wants to know how the users viewed each field, and some of the timing

> and volume measurements can be helpful in estimating performance for the new

> database, the ToBe version.


> Still more information can be reconstructed by analysis of the domains

> actually represented in the data, where often surprising correlations are

> found.  The old Information Flow Framework showed some insightful ways to

> look at actual domain sample vectors, or at least this interpretER saw it

> that way.


> -Rich


> Sincerely,

> Rich Cooper

> EnglishLogicKernel.com

> Rich AT EnglishLogicKernel DOT com

> 9 4 9 \ 5 2 5 - 5 7 1 2


> -----Original Message-----

> From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx

> [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of David Eddy

> Sent: Sunday, September 12, 2010 1:41 PM

> To: [ontolog-forum]

> Subject: Re: [ontolog-forum] Language vs Logic


> Pat -


> On Sep 12, 2010, at 12:19 AM, Patrick Cassidy wrote:


> Context for the group... & reminding Pat, since he's probably

> forgotten he said this...


> I am holding Pat to his statement at a SICoP meeting in approx 2005

> where he said (approximately) that unless "this magic" (e.g.

> ontologies, etc.) was somehow delivered & made accessible to folks in

> the trenches who have zero knowledge, interest or education in

> ontologies, ontologies would be nothing more than an interesting

> academic exercise.




> CONTEXT... I am interested in the potential use of ontology for the

> development/maintenance of software applications.


> I am increasingly coming to the conclusion that ontologies are simply

> NOT relevant to this task.


> Please tell me I'm using the wrong lance to tilt at the wrong

> windmill.  It won't hurt my feelings.





>> Figuring out precisely what a term in an ontology is supposed to

>> mean has

>> three aspects: what the person developing the ontology intends it

>> to mean;

>> what the person reading the documentation interprets it to mean,

>> and what

>> the computer executing a program using the ontology interprets it

>> to mean.

>> Ideally, the they will be the same, but they may differ.



> I would argue that since these are highly likely to be three

> different people, with all the differing experiences, perspectives &

> languages that humans tote around as life baggage, "they WILL differ"

> not may.


> Granted my interest in systems development & maintenance may be too

> narrow, I would also argue there are far more people wrestling with

> systems development/maintenance language challenges than people

> building ontologies.





>> so good documentation is critical for

>> ontologies intended to be used by more than a small tightly

>> connected group.



> My money is on the ONLY accurate documentation is the source code

> (assuming, of course, you can find the correct version).  In

> commercial applications, what paper documentation exists may have

> been accurate at one point, but if the system has been in use the

> code is the only accurate record.  [I'd like to think weapons systems

> & nuclear power plants hold to a higher standard, but I have no

> experience here.]


> This is in fact one of the great language challenges... as a system

> transitions from paper specifications & documentation through

> development into production and on to new teams of project managers &

> developers (whose native language is likely NOT English), the intent

> of the original language begins to mutate since there is no formal

> process to ensure subsequent generations of maintainers (project

> managers & coders) continue to use the same language & meanings.


> Whereas the compiler will force you to use correct VERBS, there is no

> such constraint on the NOUNS... which is why organizations end up

> with literally hundreds of names/nouns for the same thing.


> The CD/CDE (as abbreviation for CODE) example is from just such an

> experience.  The original IMS DBA enforced CD as the single correct

> abbreviation for several years in the initial system building phase.

> She left & a new DBA took over.  A new segment was added & he

> evidently liked to abbreviate CODE as CDE.  There was no automated

> mechanism like a compiler to ensure or "encourage" him to use CD

> rather than CDE.  The problem comes when one searches for "-CD

> " (note the space suffix, since CD was used as a suffix in data

> element names) you will NEVER find "-CDE ".  The devil is in the

> details.


> In a system that adheres to "good names" one learns that the name of

> something & what it is are in fact the same.  In the physical world

> there a multiple forces-the dairy, the food inspectors, the grocery

> store-to ensure a jug labeled "milk" actually contains milk.  We

> haven't quite learned this lesson yet in systems.





>> For me, good documentation means to state what one intends the

>> ontology element to mean,



> The way you present this I interpret as saying the ontology needs to

> be done BEFORE the system.


> This is, of course, not acceptable since the vast majority of systems

> are up & running & have been built/maintained without any

> consideration at all to an ontology(s).


> I don't consider reverse engineering ontologies from existing systems

> to be practical.  Primary argument... since the system owner does not

> consider it cost effective to maintain accurate, current

> documentation, they're certainly not going to spend money/time on

> reverse engineering an ontology.  I also factor in that the "reality"

> I look at is a small organization of 10,000 people, with 900

> systems.  Last year ComputerWorld said IBM, with 400,000 people had

> 4,500 "applications" (same as/different from systems? ...who knows).


> I am at pains to point out that each one of these

> "applications" (whatever an application is) was built by different

> people at different times for different objectives.  Then maintained

> by different people... all these actors bringing different language

> to the task.





>> To some extent,

>> learning to use a logic-based ontology is similar to learning to

>> use a new

>> object-oriented programming language, but programming languages

>> usually come

>> with a library of immediately useful applications as learning

>> examples.  We

>> haven't reached that point yet in the technology of ontology

>> creation and

>> dissemination.




> Long, long ago I was beginning to work on my last programming

> assignment.  I angered the architect (not a word in use then) by

> telling him I did not want to LEARN CICS (at that point a HOT

> language), rather I wanted to USE it.  Took about 10 years, but he

> finally came around to understanding what I was saying.  His

> templates (what we'd call frameworks today) were absolutely

> brilliant.  From a standing start (e.g. knowing absolutely nothing

> about CICS) I was able to take his templates & get 17 CICS programs

> working in 2 weeks.


> Twenty years later I was looking at a cross platform development

> tool... and was astonished to find a template/framework tool for

> $350.  The earlier templates probably cost the client $500,000+.


> This is the standard I hold an ontology tool to... it better not be

> any more complex than a spell checker/dictionary.  Clearly there's a

> ways to go.





>>  For the time being, I look first at the logical axioms associated

>> with a

>> term in an ontology, then at the documentation (usually contained

>> in the

>> "comments' section of an ontology element)



> You keep using words that are difficult to grok...

> "documentation"?    "comments"?  They are outside my experience.   :-)



> Here's what I consider to be documentation...


> a = b * c


> Totally accurate & not very useful.  More precisely... USELESS!

> Unfortunately there's a lot of this.


> The same logical statement:


> wkly-pay = hrs-wkd * rate-pay


> Is now potentially comprehensible.


> If I can determine this is code is in a payroll module then I'm going

> to assume that "pay" is likely a dollar & cents amount.  If I can

> deduce this from just the name without needed to ask someone or fish

> around in some questionable documentation, then I'm a happy camper.


> But what I would really like is the ability to hot-key/right click on

> these variables & see what they mean.  I think this look-up facility

> is possible in modern editors like Eclipse... but someone has to dig

> up what the words mean in the context of their specific use... which

> may or may not say anything about their meaning somewhere else in the

> system.


> ___________________

> David Eddy

> deddy@xxxxxxxxxxxxx


> 781-455-0949


Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>