Rich,
Even if you are talking about AsIs Data base, Ed Yourdon is
the person started Structure Analysis concept later he wrote books about
Object Oriented Analysis and Design with Peter Coad ( I have
signed copy of the book .) I remember attending Peter Coad's
lecture when he released the book in 1991.
People who studied these topics, follow the some analysis and design
techniques all their lives.
This is the most interesting topic I read in a long
time.. Thanks Dr. Sowa,
Pavithra
--- On Tue, 9/14/10, Rich Cooper <rich@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
From: Rich Cooper <rich@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [ontolog-forum] Language vs Logic
To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
Date: Tuesday, September 14, 2010, 9:32 PM
John,
Jim, Ed, Dave and et al,
I
think this thread is going in the wrong direction entirely. It
seems to me that the effort should be focused on the English statements
found in the AsIs database, and the ways in which AsIs users made errors
in the database.
In
a recent case, I was able to show that over 40% of the rows in an actual
database had errors of one kind or another in them – and not just
the English statement fields, even the so called structured fields are
like that.
To
show that, I had to write a program to extract the actual data in the
database, not just the data fields which happen to be compliant with the
meta data specifications. Often, the entered data is
"cleaned" by the software to meet the specs, but it often isn't
the same semantics that the user entered, and is often clearly
semantically different than what user intended. Users just
don’t much care how much sweat the managers, BAs, SysEs and SWEs
have expended on their behalf. They just use it to get their jobs
done with a minimum of attention. Remember that collecting data is
not their major concern – they want the sales to go through no
matter what they have to do to get there.
Reengineering
a database (not the code, but the data and data model) is itself a
discovery process that has the usual four lower level processes running
concurrently. These are <Experimenting, Classifying, Observing,
Theorizing>, as shown in the following chart of the sixteen way
interactions among the four processes:
Where
each box in the figure begins with a number, that number correlates to a
paragraph or longer description of the interaction for those two
processes in the document at www.englishlogickernel.com/Patent-7-209-923-B1.PDF
The approach described there uses corpus analysis and context modeling to
discover the semantics used in the actual database. Knowing that
the information was perceived in a particular way by each front end user
helps the development architects figure out how to design the ToBe
system, and without it, that information is normally not available to the
ToBe system designers.
-Rich
Sincerely,
Rich
Cooper
EnglishLogicKernel.com
Rich
AT EnglishLogicKernel DOT com
9
4 9 \ 5 2 5 - 5 7 1 2
-----Original
Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of John F. Sowa
Sent: Tuesday, September 14, 2010 5:50 PM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Cc: Ken Orr; Arun Majumdar
Subject: Re: [ontolog-forum] Language vs Logic
Jim,
Ed, Dave, and Rich,
I
completely agree with what you say below But I will add that
proper
tools can make *many orders of magnitude* improvement --
e.g.,
going from 80 person years of tedious work to 15 person weeks
of
more exciting stuff.
I
realize that you're not going to believe what I say below, but you
can
verify it by asking Ed Yourdon. He did the initial consulting
before
recommending Arun Majumdar and Andre Leclerc for a short
study
project, which ended up in delivering exactly what a major
consulting
firm claimed would take 80 person-years by the hand
methods
that you describe.
Another
person who is familiar with the project and all parties
involved
is Ken Orr (on cc list above).
JR>>
This is OK as long as you realize that data integrity and data
semantics
are
>>
contained in the applications, that you understand these legacy systems
well
>>
enough to be sure you understand the data semantics and that you can
>>
reproduce them without error. Legacy databases are often full of codes
that
>>
are meaningless except when interpreted by the applications.
EB>
Strongly agree. Reverse engineering a "legacy" (read:
existing/useful)
>
database can be an intensely manual process. Analysis of the
>
application code can tell you what a data element is used for and how it
>
is used/interpreted. The database schema itself can only give you a
>
name, a key set, and a datatype. OK, SQL2 allows you to add a lot
of
>
rules about data element relationships, and presumably the ones that are
>
actually written in the schema have some conceptual basis.
I
also agree. But it is possible to analyze the executable code and
compare
it to *all* the English (or other NL) documentation -- that
includes
specifications, requirements documents, manuals, emails,
rough
notes, and transcriptions of oral remarks by users, managers,
programmers,
etc.
For
a brief summary of the requirements by the customer, the method
by
which Arun and Andre conducted the "study", and the results,
stored
on one CD-ROM, which were exactly what the customer wanted,
see
slides 91 to 98 of the following:
http://www.jfsowa.com/talks/iss.pdf
Just
type "91" into the Adobe counter at the bottom of the screen
to
go straight to those slides.
EB>
Reverse engineering a database is the process of converting a data
>
structure model back into the concept model that it implements. And
the
>
problem is that the "forward engineering" mapping is not one to
one from
>
modeling_language_ to implementation_language_. It is
many-to-one,
>
which means that a simple inversion rule is wrong much of the time, and
>
the total effect of the simple rules on an interesting database schema
>
is always to produce nonsense. Application analysis has the
advantage
>
of context in each element interpretation; database schema analysis is
>
exceedingly limited in that regard.
That
is part of the job, but it doesn't solve the problem of 40 years
of
legacy code with numerous patches and outdated documentation.
The
customer's problem was (1) to *verify* the mapping between
documentation
and implementation and report all discrepancies
(or
at least as many a could be found), (2) to build a glossary of
all
the English terminology with cross references to all the changes
over
the years, (3) to build a data dictionary with a specification
that
corresponded to the implementation, not to the obsolete
documentation,
and (4) to cross reference all the English terms
with
all the programming and database terms and all the changes
over
the years.
EB>
That said, other contextual knowledge can be brought to bear. If,
for
>
example, you know that the database design followed some
"information
>
analysis method" and the database schema was then
"generated"...
Good
luck when some of the programs predated any kind of "methods",
other
documentation was lost years ago, and the people who wrote
or
patched the code retired, died, moved on, or just forgot.
EB>
So, if you know the design method and believe it was used consistently
>
and faithfully, you can code a reverse mapping that is complex but
>
fairly reliable, but you still have to have human engineers looking over
>
every detail and repairing the weird things....
Arun
and Andre were the two engineers who checked anything that the
computer
couldn't resolve automatically. And the computer did indeed
discover
a lot of weird stuff. Look at slides 95 to 97 for a tiny
sample
of weird.
But
as they continued with the analysis, Arun and Andre found that
the
computer's estimate of how certain it was about any conclusion
was
usually right. They raised the threshold, so that the computer
wouldn't
ring a bell to alert them unless it was really uncertain
about
some point.
DMcD>
Most of the legacy systems we see were forward engineered once upon
>
a time, but then modified in place, without going through the original
>
model to design to code process. So you have a mix of things that
can
>
be faithfully reverse engineered mixed in with things that just got
bolted on.
Yes.
And when the code is up to 40 years old, there are a lot of
ad
hoc bolts. That's why the big consulting firm estimated that it
would
require 80 person-years to do the job. But Arun and Andre
did
it in 15 person-weeks (while the computer worked 24/7).
RC>
Personally, I have found that most AsIs DBs are useful histories
>
of how people reacted to the expressed interfaces. The code, which
>
is supposed to interpret the fields, is often not consistent with
>
the way people used the database.
That's
true. That's why you need to relate the implementation
to
*all* the documentation by users of every kind as well as
by
managers and programmers of every kind. They all have
different
views of the system, and it's essential to correlate
all
their documents and cross reference them to each other and
to
the actual implementation.
EB>
Yes, you can be stuck with maintenance programmers and ignorant
>
users. But that means you are genuinely flying blind with respect
>
to the actual data content and intent...
Yes,
that's why the customer asked the consulting firm to analyze
all
their software and all their documentation. When that estimate
was
too high, they asked Ed Yourdon for a second opinion, and he
called
in Arun and Andre. They delivered a solution that gave
the
customer everything that the big firm claimed that they would
require
80 person-years to do.
Please
read the slides. And as I said, you don't have to take
my
word for it. There is also a Cutter Consortium technical
report
written by Andre and Arun. Ask Arun for a copy. But
it
doesn't say as much about the NLP technology as I wrote
on
the iss.pdf slides.
John
_________________________________________________________________
Message
Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config
Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe:
mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared
Files: http://ontolog.cim3.net/file/
Community
Wiki: http://ontolog.cim3.net/wiki/
To
join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To
Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
-----Inline Attachment Follows-----
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
|