Dear David,
I hadn’t thought of using Dublin
Core to represent software “documents” which normally have
software-specific support tools for version control, compiler requirements,
test case documents, and so on. It seems to me that Dublin Core is really
only intended to replace the library card catalog for which it was
designed.
As for software document standards, the
DoD 2167 specs and their descendants seem very well formulated for keeping
track of software engineering documents – Requirements, Units, Packages, Designs,
Tests, and all that other stuff. Those standards were expensive to
conform to, but they also made the software being developed more fit for its
eventual role as just more legacy software on top of the old stuff.
I gave Dublin Core as an example to
emphasize how it fits the OOD philosophy of building small bottom level components
that may be useful outside of the first few applications, such as the Delphi
component palette was useful in that way. But I don’t mean to read
more into Dublin Core than was intended to be there in the first place.
You wrote:
Yes, software does have its odd/unique/strange
properties, but it's not that unique. Author(s), Date created, Date last
maintained. Most of the 15 DCMI attributes look directly useful for keeping
track of software. The trick, of course, is what values actually go into the
attributes. How can that be automated?
Yes, all the 2167++ documents could be
described in Dublin Core for tracking who what when where why and whatever, but
they don’t treat the substance of software, such as the roles that each
type of SW document plays in a development project.
So DCMI could possible be one of those
bricks, which has to be augmented by lots more bricks that have knowledge of
the development waterfall. So Dublin Core is certainly not adequate to
represent all the things that a well engineered SW project needs. It is
simply one small brick in a haystack of bricks.
But your question:
How
can that be automated?
Is best answered by my patent (attached)
with which you are somewhat familiar, I trust. That patent document shows
how to partition, aggregate and cluster concepts that are represented in a
legacy database which were, and which were not, anticipated in the original
development project.
You have probably had occasion to look
over a database and been surprised at how varied the rows were. The
patent describes methods to recognize the variations as well as the
commonalities. In practice, the variations are due to the observers’
different ways of thinking about the domains they were working with. The
database design was intended to fit the expected data entry, but the actual
data entry was, of course, completely different than what had been intended by
the designers.
So in my opinion, that patent describes
methods that are useful in reconciling the actual (AsIs) database in contrast
to the (ToBe) database documented at design time.
HTH,
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
From:
ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of David Eddy
Sent: Sunday, February 26, 2012
11:13 AM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] What goes into a Lexicon?
Rich -
On Feb 26, 2012, at 12:19 PM, Rich Cooper
wrote:
All financial justification is from the application up, not from the
philosophy down.
We're a choir... probably of two.
I suspect it would be quite easy to prove the financial benefit.
Take the same difficult-to-understand code & have two similarly
experienced programmers, with no knowledge of this code, make changes.
Only difference is that Programmer A has to figure out the language
themselves, while Programmer B has a language list/lexicon/ontology/cheat
sheet/dictionary/glossary that explains what the cryptic language means.
Without even doing the experiment you know Programmer B has a huge
advantage.
Let's see... neither you nor I read French. We're both given a
document to translate. You do not have a dictionary. I do.
Who wins?
The only difference is the cost of providing the language list in an
accessible form... but amortized over 20-30 years, that cost goes close to
zero. Besides, we're only dealing with at the outside maybe 1000 terms
for a large application.
The interesting challenge is how to discover the core application
language as expressed in COBOL, EasyTrieve, Focus, BAL, etc. & make it
available to the Java, JSON, Objective-C, PHP, Perl, etc. generation of
programmers. I would argue that the new generation of programmers is
juggling new labels, not new data. May come as a revelation that
soc_sec_no and socialSecurityNumber just might possibly represent the same
thing?
That is why Dublin Core
succeeded. It is small, very simple, well documented, and matched by
large hunks of examples that describe document provenance within its
limits. In that sense, Dublin Core is a new legacy just as the software
which interfaces with it might be a legacy database.
One thing that distresses me about Dublin Core is that to the best of
what I've been able to find (it is a big world), Dublin Core does not regard
software source code as a document. Best I've been able to find is that
if the source code were printed out—huh? WHY would you do
that?—then it would be a document. I haven't been able to figure
that out. If on paper its a document, if just electrons then not.
If useless, then document. If useful, not document.
Yes, software does have its odd/unique/strange properties, but it's not
that unique. Author(s), Date created, Date last maintained. Most of
the 15 DCMI attributes look directly useful for keeping track of software.
The trick, of course, is what values actually go into the attributes.
How can that be automated?