[Top] [All Lists]

Re: [ontolog-forum] Language vs Logic

To: ontolog-forum@xxxxxxxxxxxxxxxx
Cc: Ken Orr <kenorr@xxxxxxxxxxxxxx>, Arun Majumdar <arun@xxxxxxxxxxxx>
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Tue, 14 Sep 2010 20:50:17 -0400
Message-id: <4C901849.80206@xxxxxxxxxxx>
Jim, Ed, Dave, and Rich,    (01)

I completely agree with what you say below  But I will add that
proper tools can make *many orders of magnitude* improvement --
e.g., going from 80 person years of tedious work to 15 person weeks
of more exciting stuff.    (02)

I realize that you're not going to believe what I say below, but you
can verify it by asking Ed Yourdon.  He did the initial consulting
before recommending Arun Majumdar and Andre Leclerc for a short
study project, which ended up in delivering exactly what a major
consulting firm claimed would take 80 person-years by the hand
methods that you describe.    (03)

Another person who is familiar with the project and all parties
involved is Ken Orr (on cc list above).    (04)

JR>> This is OK as long as you realize that data integrity and data 
semantics are
>> contained in the applications, that you understand these legacy systems well
>> enough to be sure you understand the data semantics and that you can
>> reproduce them without error. Legacy databases are often full of codes that
>> are meaningless except when interpreted by the applications.    (05)

EB> Strongly agree.  Reverse engineering a "legacy" (read: existing/useful)
> database can be an intensely manual process.  Analysis of the
> application code can tell you what a data element is used for and how it
> is used/interpreted.  The database schema itself can only give you a
> name, a key set, and a datatype.  OK, SQL2 allows you to add a lot of
> rules about data element relationships, and presumably the ones that are
> actually written in the schema have some conceptual basis.    (06)

I also agree.  But it is possible to analyze the executable code and
compare it to *all* the English (or other NL) documentation -- that
includes specifications, requirements documents, manuals, emails,
rough notes, and transcriptions of oral remarks by users, managers,
programmers, etc.    (07)

For a brief summary of the requirements by the customer, the method
by which Arun and Andre conducted the "study", and the results,
stored on one CD-ROM, which were exactly what the customer wanted,
see slides 91 to 98 of the following:    (08)

    http://www.jfsowa.com/talks/iss.pdf    (09)

Just type "91" into the Adobe counter at the bottom of the screen
to go straight to those slides.    (010)

EB> Reverse engineering a database is the process of converting a data
> structure model back into the concept model that it implements.  And the
> problem is that the "forward engineering" mapping is not one to one from
> modeling_language_  to implementation_language_.  It is many-to-one,
> which means that a simple inversion rule is wrong much of the time, and
> the total effect of the simple rules on an interesting database schema
> is always to produce nonsense.  Application analysis has the advantage
> of context in each element interpretation; database schema analysis is
> exceedingly limited in that regard.    (011)

That is part of the job, but it doesn't solve the problem of 40 years
of legacy code with numerous patches and outdated documentation.
The customer's problem was (1) to *verify* the mapping between
documentation and implementation and report all discrepancies
(or at least as many a could be found), (2) to build a glossary of
all the English terminology with cross references to all the changes
over the years, (3) to build a data dictionary with a specification
that corresponded to the implementation, not to the obsolete
documentation, and (4) to cross reference all the English terms
with all the programming and database terms and all the changes
over the years.    (012)

EB> That said, other contextual knowledge can be brought to bear.  If, for
> example, you know that the database design followed some "information
> analysis method" and the database schema was then "generated"...    (013)

Good luck when some of the programs predated any kind of "methods",
other documentation was lost years ago, and the people who wrote
or patched the code retired, died, moved on, or just forgot.    (014)

EB> So, if you know the design method and believe it was used consistently
> and faithfully, you can code a reverse mapping that is complex but
> fairly reliable, but you still have to have human engineers looking over
> every detail and repairing the weird things....    (015)

Arun and Andre were the two engineers who checked anything that the
computer couldn't resolve automatically.  And the computer did indeed
discover a lot of weird stuff.  Look at slides 95 to 97 for a tiny
sample of weird.    (016)

But as they continued with the analysis, Arun and Andre found that
the computer's estimate of how certain it was about any conclusion
was usually right.  They raised the threshold, so that the computer
wouldn't ring a bell to alert them unless it was really uncertain
about some point.    (017)

DMcD> Most of the legacy systems we see were forward engineered once upon
> a time, but then modified in place, without going through the original
> model to design to code process.  So you have a mix of things that can
> be faithfully reverse engineered mixed in with things that just got bolted on.    (018)

Yes.  And when the code is up to 40 years old, there are a lot of
ad hoc bolts.  That's why the big consulting firm estimated that it
would require 80 person-years to do the job.  But Arun and Andre
did it in 15 person-weeks (while the computer worked 24/7).    (019)

RC> Personally, I have found that most AsIs DBs are useful histories
> of how people reacted to the expressed interfaces.  The code, which
> is supposed to interpret the fields, is often not consistent with
> the way people used the database.    (020)

That's true.  That's why you need to relate the implementation
to *all* the documentation by users of every kind as well as
by managers and programmers of every kind.  They all have
different views of the system, and it's essential to correlate
all their documents and cross reference them to each other and
to the actual implementation.    (021)

EB> Yes, you can be stuck with maintenance programmers and ignorant
> users.  But that means you are genuinely flying blind with respect
> to the actual data content and intent...    (022)

Yes, that's why the customer asked the consulting firm to analyze
all their software and all their documentation.  When that estimate
was too high, they asked Ed Yourdon for a second opinion, and he
called in Arun and Andre.   They delivered a solution that gave
the customer everything that the big firm claimed that they would
require 80 person-years to do.    (023)

Please read the slides.  And as I said, you don't have to take
my word for it.  There is also a Cutter Consortium technical
report written by Andre and Arun.  Ask Arun for a copy.  But
it doesn't say as much about the NLP technology as I wrote
on the iss.pdf slides.    (024)

John    (025)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (026)

<Prev in Thread] Current Thread [Next in Thread>