Hi Ed, Dave, Jim, David, et al, comments below,
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Ed Barkmeyer
Sent: Tuesday, September 14, 2010 8:41 AM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Language vs Logic
Jim Rhyne wrote:
> This is OK as long as you realize that data integrity and data
semantics are
> contained in the applications, that you understand these legacy
systems well
> enough to be sure you understand the data semantics and that you
can
> reproduce them without error. Legacy databases are often full of
codes that
> are meaningless except when interpreted by the applications.
>
Strongly agree. Reverse engineering a
"legacy" (read: existing/useful)
database can be an intensely manual process.
Analysis of the
application code can tell you what a data element is
used for and how it
is used/interpreted. The database schema itself
can only give you a
name, a key set, and a datatype. OK, SQL2 allows
you to add a lot of
rules about data element relationships, and presumably
the ones that are
actually written in the schema have some conceptual
basis.
Personally, I have found that most AsIs
DBs are useful histories of how people reacted to the expressed interfaces. The
code, which is supposed to interpret the fields, is often not consistent with
the way people used the database.
Also, the reason a new app is being built
is normally (not always) because the old app is out of date, and upgrading the
code itself is too expensive, so a new ToBe app is a justified development
expense.
But let's distinguish between the
executable code and the stored data - the data is how people reacted to the
expressed interface, while the code is how the programmers and maintenance team
reacted to the complaints of functional inappropriateness. The code is
nearly never worth keeping, though you might get a few juicy routines out of
thousands to millions of lines of code. The stored data can be
informative, but it probably can't be used intact in the ToBe system. The
data, on the other hand, is a useful trace of requirements information showing
how the system was actually perceived by the users. That, IMHO, is its
major contribution to the ToBe system.
Reverse engineering a database is the process of
converting a data
structure model back into the concept model that it
implements. And the
problem is that the "forward engineering"
mapping is not one to one from
modeling _language_ to implementation
_language_. It is many-to-one,
which means that a simple inversion rule is wrong much
of the time, and
the total effect of the simple rules on an interesting
database schema
is always to produce nonsense. Application
analysis has the advantage
of context in each element interpretation; database
schema analysis is
exceedingly limited in that regard.
Agreed. Also, the sheer volume of
data, especially when informed by the timeline of data entry, can help explain
the performance requirements to the new development team in a way that can be
translated into performance requirements for the ToBe system. But the
data itself is not useful in the new system nearly any of the time.
That said, other contextual knowledge can be brought
to bear. If, for
example, you know that the database design followed
some "information
analysis method" and the database schema was then
"generated" (even if
by hand) according to the principles of that method,
then you may be
able to recognize "entity" tables and
"relationship" tables and
"attributes" and "dependencies"
and "code" data types and "date" data
types, and so on. And if the schema rules are
also written according to
the methodology, they can help. But a different
analysis/design method
may beget very similar structures with a different set
of conventions
for naming and relationship representation. For
example, is a foreign
key attribute in an entity table an existential
dependency or just a
"functional" (1..1 or 0..1)
relationship? And is a subclass represented
by a separate table, or by a code attribute (type
name) if it has no
local attributes (as distinct from local
relationships)? And there is
always the question of what "null permitted"
means. (Remember Ted
Codd's "kinds of nothing"?)
So, if you know the design method and believe it was
used consistently
and faithfully, you can code a reverse mapping that is
complex but
fairly reliable, but you still have to have human
engineers looking over
every detail and repairing the weird things.
Agreed, with emphasis.
Further, the human
engineers must be familiar with the application
domain, and have access
to the business experts and some of the software
engineers.
Those engineers, business managers, and even users are seldom the same
as the ones who built the AsIs system, which was likely done years before. The
turnover in SWE is very high now compared to years ago, so expertise with the
old system is often very hard to find.
All of this
translates to a full-blown software engineering
project with some
assistance from software analysis tools. I'm not
sure how much easier
that is than using software analysis tools on the
applications, and in
this day and age, there is no reason not to use both
sets of tools,
especially if the providers have tool sets that work
together.
-Ed
P.S. OMG has a whole gang of software analysis
tool vendors making
standards for interchange of the analytical results,
because none of
them alone ever has the right set of capabilities for
any major
customer. They are the Software Assurance Group
and the
"Architecture-Driven Modernization" Task
Force. (The latter is ADM, a
play on the OMG "MDA" engineering approach,
because they do _reverse_
engineering.)
Since I am not familiar with the very latest
commercial tools for the latest DB and SWE representations, you may be
right. But the tools are mostly available for development, not for
analysis of code. With even a few years between the AsIs and ToBe
developments, the underlying software technology changes so fast that the old
methods are discarded in most new ToBe systems for commercial purposes.
-Rich
--
Edward J.
Barkmeyer
Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100
Bureau Drive, Stop
8263
Tel: +1 301-975-3528
Gaithersburg,
MD 20899-8263
FAX: +1 301-975-4694
"The opinions expressed above do not reflect
consensus of NIST,
and have not been reviewed by any Government
authority."
>
> Jim Rhyne
> Software Renovation Consulting
> Los
Gatos, California
> http://www.enterprisesoftwarerenovation.com/
>
>
>
> -----Original Message-----
> From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
> [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of Rich Cooper
> Sent: Sunday, September 12, 2010 2:04 PM
> To: '[ontolog-forum] '
> Subject: Re: [ontolog-forum] Language vs Logic
>
> Hi David,
>
> You are right-on with a realistic view of how
this will progress, IMHO.
>
> But instead of reverse engineering legacy
systems, consider projects to
> reverse engineer legacy DATABASEs. That is
a whole lot more effective and
> way less expensive. Also, it happens to be
one subject discussed in my
> patent at
http://www.englishlogickernel.com/Patent-7-209-923-B1.PDF, which I
> mentioned earlier.
>
> By reverse engineering the database, you can
still use whatever remains
> useful of the old data model, the AsIs
version. It helps define what the
> users were actually typing into those fields,
just in case the new design
> team wants to know how the users viewed each
field, and some of the timing
> and volume measurements can be helpful in
estimating performance for the new
> database, the ToBe version.
>
> Still more information can be reconstructed by
analysis of the domains
> actually represented in the data, where often
surprising correlations are
> found. The old Information Flow Framework
showed some insightful ways to
> look at actual domain sample vectors, or at least
this interpretER saw it
> that way.
>
> -Rich
>
> Sincerely,
> Rich Cooper
> EnglishLogicKernel.com
> Rich AT EnglishLogicKernel DOT com
> 9 4 9 \ 5 2 5 - 5 7 1 2
>
> -----Original Message-----
> From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
> [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of David Eddy
> Sent: Sunday, September 12, 2010 1:41 PM
> To: [ontolog-forum]
> Subject: Re: [ontolog-forum] Language vs Logic
>
> Pat -
>
> On Sep 12, 2010, at 12:19 AM, Patrick Cassidy
wrote:
>
> Context for the group... & reminding Pat,
since he's probably
> forgotten he said this...
>
> I am holding Pat to his statement at a SICoP
meeting in approx 2005
> where he said (approximately) that unless
"this magic" (e.g.
> ontologies, etc.) was somehow delivered &
made accessible to folks in
> the trenches who have zero knowledge, interest or
education in
> ontologies, ontologies would be nothing more than
an interesting
> academic exercise.
>
>
>
> CONTEXT... I am interested in the potential use
of ontology for the
> development/maintenance of software applications.
>
> I am increasingly coming to the conclusion that
ontologies are simply
> NOT relevant to this task.
>
> Please tell me I'm using the wrong lance to tilt
at the wrong
> windmill. It won't hurt my feelings.
>
>
>
>
>> Figuring out precisely what a term in an
ontology is supposed to
>> mean has
>> three aspects: what the person developing the
ontology intends it
>> to mean;
>> what the person reading the documentation
interprets it to mean,
>> and what
>> the computer executing a program using the
ontology interprets it
>> to mean.
>> Ideally, the they will be the same, but they
may differ.
>>
>
> I would argue that since these are highly likely
to be three
> different people, with all the differing
experiences, perspectives &
> languages that humans tote around as life
baggage, "they WILL differ"
> not may.
>
> Granted my interest in systems development &
maintenance may be too
> narrow, I would also argue there are far more
people wrestling with
> systems development/maintenance language
challenges than people
> building ontologies.
>
>
>
>
>> so good documentation is critical for
>> ontologies intended to be used by more than a
small tightly
>> connected group.
>>
>
> My money is on the ONLY accurate documentation is
the source code
> (assuming, of course, you can find the correct
version). In
> commercial applications, what paper documentation
exists may have
> been accurate at one point, but if the system has
been in use the
> code is the only accurate record. [I'd like
to think weapons systems
> & nuclear power plants hold to a higher
standard, but I have no
> experience here.]
>
> This is in fact one of the great language
challenges... as a system
> transitions from paper specifications &
documentation through
> development into production and on to new teams
of project managers &
> developers (whose native language is likely NOT
English), the intent
> of the original language begins to mutate since
there is no formal
> process to ensure subsequent generations of
maintainers (project
> managers & coders) continue to use the same
language & meanings.
>
> Whereas the compiler will force you to use
correct VERBS, there is no
> such constraint on the NOUNS... which is why
organizations end up
> with literally hundreds of names/nouns for the
same thing.
>
> The CD/CDE (as abbreviation for CODE) example is
from just such an
> experience. The original IMS DBA enforced
CD as the single correct
> abbreviation for several years in the initial
system building phase.
> She left & a new DBA took over. A new
segment was added & he
> evidently liked to abbreviate CODE as CDE.
There was no automated
> mechanism like a compiler to ensure or
"encourage" him to use CD
> rather than CDE. The problem comes when one
searches for "-CD
> " (note the space suffix, since CD was used
as a suffix in data
> element names) you will NEVER find "-CDE
". The devil is in the
> details.
>
> In a system that adheres to "good
names" one learns that the name of
> something & what it is are in fact the
same. In the physical world
> there a multiple forces-the dairy, the food
inspectors, the grocery
> store-to ensure a jug labeled "milk"
actually contains milk. We
> haven't quite learned this lesson yet in systems.
>
>
>
>
>> For me, good documentation means to state
what one intends the
>> ontology element to mean,
>>
>
> The way you present this I interpret as saying
the ontology needs to
> be done BEFORE the system.
>
> This is, of course, not acceptable since the vast
majority of systems
> are up & running & have been
built/maintained without any
> consideration at all to an ontology(s).
>
> I don't consider reverse engineering ontologies
from existing systems
> to be practical. Primary argument... since
the system owner does not
> consider it cost effective to maintain accurate,
current
> documentation, they're certainly not going to
spend money/time on
> reverse engineering an ontology. I also
factor in that the "reality"
> I look at is a small organization of 10,000
people, with 900
> systems. Last year ComputerWorld said IBM,
with 400,000 people had
> 4,500 "applications" (same as/different
from systems? ...who knows).
>
> I am at pains to point out that each one of these
> "applications" (whatever an application
is) was built by different
> people at different times for different
objectives. Then maintained
> by different people... all these actors bringing
different language
> to the task.
>
>
>
>
>> To some extent,
>> learning to use a logic-based ontology is
similar to learning to
>> use a new
>> object-oriented programming language, but
programming languages
>> usually come
>> with a library of immediately useful
applications as learning
>> examples. We
>> haven't reached that point yet in the
technology of ontology
>> creation and
>> dissemination.
>>
>
>
> Long, long ago I was beginning to work on my last
programming
> assignment. I angered the architect (not a
word in use then) by
> telling him I did not want to LEARN CICS (at that
point a HOT
> language), rather I wanted to USE it. Took
about 10 years, but he
> finally came around to understanding what I was
saying. His
> templates (what we'd call frameworks today) were
absolutely
> brilliant. From a standing start (e.g.
knowing absolutely nothing
> about CICS) I was able to take his templates
& get 17 CICS programs
> working in 2 weeks.
>
> Twenty years later I was looking at a cross
platform development
> tool... and was astonished to find a
template/framework tool for
> $350. The earlier templates probably cost
the client $500,000+.
>
> This is the standard I hold an ontology tool
to... it better not be
> any more complex than a spell
checker/dictionary. Clearly there's a
> ways to go.
>
>
>
>
>> For the time being, I look first at the
logical axioms associated
>> with a
>> term in an ontology, then at the
documentation (usually contained
>> in the
>> "comments' section of an ontology
element)
>>
>
> You keep using words that are difficult to
grok...
> "documentation"?
"comments"? They are outside my experience. :-)
>
>
> Here's what I consider to be documentation...
>
> a = b * c
>
> Totally accurate & not very useful.
More precisely... USELESS!
> Unfortunately there's a lot of this.
>
> The same logical statement:
>
> wkly-pay = hrs-wkd * rate-pay
>
> Is now potentially comprehensible.
>
> If I can determine this is code is in a payroll
module then I'm going
> to assume that "pay" is likely a dollar
& cents amount. If I can
> deduce this from just the name without needed to
ask someone or fish
> around in some questionable documentation, then
I'm a happy camper.
>
> But what I would really like is the ability to
hot-key/right click on
> these variables & see what they mean. I
think this look-up facility
> is possible in modern editors like Eclipse... but
someone has to dig
> up what the words mean in the context of their
specific use... which
> may or may not say anything about their meaning somewhere
else in the
> system.
>
> ___________________
> David Eddy
> deddy@xxxxxxxxxxxxx
>
> 781-455-0949