[Top] [All Lists]

Re: [ontolog-forum] Accommodating legacy software

To: "[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>
From: David Eddy <deddy@xxxxxxxxxxxxx>
Date: Mon, 3 Sep 2012 20:58:14 -0400
Message-id: <86708482-4EE1-49E8-9E4E-194523C9D27D@xxxxxxxxxxxxx>
Kingsley -

On Sep 2, 2012, at 9:54 AM, Kingsley Idehen wrote:

Have the columns you're interested in been profiled?

What does that mean

Excuse me... should have fully spelled it out... "data profiling"

I first stepped in this particular cow-pie in 1976 when working with the Massachusetts AFDC (Aid For Dependent Children) masterfile.  Turn on your wayback machine... "database" really wasn't a word in widespread use & less in actual use.  Flat files ruled.

Long story short, egg on my face, turns out there were—surprise, surprise!!—7 different values in the gender code field.  Who would have thought.  Sure wish someone had told me.  I might have looked, but did not have access to the live data.

To the best of my knowledge, this practice continues today (I've even done it to myself)... when one looks into a field/column you're likely to find pretty much anything.

Data profiling—Jack Olson wrote the book "Data Quality"   http://svaltech.com/education.html—on the subject.

I don't remember precisely, but I've been told the Canadian health care "standard" for gender code is something like 14 or 17 values.

Data profiling is a non-trivial exercise where one examines in painful statistical detail the actual—as opposed to the expected/believed—contents of fields/columns.  Tends to get ugly very quickly.

If you're tossing a bunch of RDBMS data into BAGs, I'm a little surprised you've not encountered this issue.

real stuff where you have data objects derived from a relational schema with base semantics in play that reflect the mundane semantics of an rdbms en route to enhancing the resulting view (be transient or materialized) using addition semantics
I suspect that my use of semantics is likely different than yours.  For certain mine has no grounding whatsoever in OWL or RDF.

What does "relational schema with base semantics" mean in your context?  I think the tricky word here is "base."

What does "mundane semantics" mean?

How do you KNOW what's in that legacy Silo?

Of course I do. How else would that work?

How would you know?

Ask the manager who hired you?   Riiiiiiight!

Look at the column names?  Double riiiiiiiiiight.

Look at the data itself & assume it has intrinsic meaning?  Triple riiiiiiight!!

And this is just the DATA... which I am NOT interested in.  

Can anything in the SW stack do something useful for the systems—legacy or otherwise—themselves?  By this I mean the 100s/1000s of software languages used to write the systems that produce the data you're dealing with.

RDBMS is nice, but what about flat files, IDMS, IMS, M204, S2000, Adabase, etc, etc, etc., that are ALL in active use.

- David

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>