ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Accommodating legacy software

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Mon, 03 Sep 2012 21:14:15 -0400
Message-id: <504555E7.1040000@xxxxxxxxxxxxxx>
On 9/3/12 8:58 PM, David Eddy wrote:
Kingsley -

On Sep 2, 2012, at 9:54 AM, Kingsley Idehen wrote:

Have the columns you're interested in been profiled?

What does that mean

Excuse me... should have fully spelled it out... "data profiling"


I first stepped in this particular cow-pie in 1976 when working with the Massachusetts AFDC (Aid For Dependent Children) masterfile.  Turn on your wayback machine... "database" really wasn't a word in widespread use & less in actual use.  Flat files ruled.

Long story short, egg on my face, turns out there were—surprise, surprise!!—7 different values in the gender code field.  Who would have thought.  Sure wish someone had told me.  I might have looked, but did not have access to the live data.


To the best of my knowledge, this practice continues today (I've even done it to myself)... when one looks into a field/column you're likely to find pretty much anything.

Data profiling—Jack Olson wrote the book "Data Quality"   http://svaltech.com/education.html—on the subject.


I don't remember precisely, but I've been told the Canadian health care "standard" for gender code is something like 14 or 17 values.


Data profiling is a non-trivial exercise where one examines in painful statistical detail the actual—as opposed to the expected/believed—contents of fields/columns.  Tends to get ugly very quickly.

If you're tossing a bunch of RDBMS data into BAGs, I'm a little surprised you've not encountered this issue.

Of course I've encountered this kind of a problem and RDBMS to RDF based Linked Data mapping doesn't imply dumb mapping. The beauty of semantics (in this regard) is that you are mapping from the semantically improvised RDBMS to a semantically reach view (transient or materialized). The quality of the semantic mapping should only be limited by the modelling prowess of the mapper (human and/or machine).




real stuff where you have data objects derived from a relational schema with base semantics in play that reflect the mundane semantics of an rdbms en route to enhancing the resulting view (be transient or materialized) using addition semantics
 
I suspect that my use of semantics is likely different than yours.  For certain mine has no grounding whatsoever in OWL or RDF.

OWL enables you to enhance entity relationships with fine-grained semantics. It isn't the only way, but that's the purpose it serves in this context.


What does "relational schema with base semantics" mean in your context?  I think the tricky word here is "base."

I am referring to ontology generation from an RDBMS schema. Such an ontology can be used as the basis for additional semantic richness by mapping to other ontologies. The entire game is about relationship semantics and their ultimate use as power facilitators for reasoning at data query (typically backward-chained) and generation time (typically forward-chained).


What does "mundane semantics" mean?

Foreign key relationships and isA (type) relationships are mundane. Equivalence by name (denotation) or value (inverse-functional) enacting symmetry and applying full transitive closure, that scales isn't mundane.





How do you KNOW what's in that legacy Silo?

Of course I do. How else would that work?

How would you know?

Ask the manager who hired you?   Riiiiiiight!

Of course not, you can derive basic (mundane) entity relationship information from an RDBMS schema. You can then map it to subject matter or domain specific ontologies etc..


Look at the column names?  Double riiiiiiiiiight.

Look at the data itself & assume it has intrinsic meaning?  Triple riiiiiiight!!

See my comments above. Technology has really move on, seriously now.




And this is just the DATA... which I am NOT interested in. 

I am not talking about the raw data modulo semantics. I am talking about structured data enhanced with relationship semantics.


Can anything in the SW stack do something useful for the systems—legacy or otherwise—themselves?

Yes, of course.

 By this I mean the 100s/1000s of software languages used to write the systems that produce the data you're dealing with.

RDBMS is nice, but what about flat files, IDMS, IMS, M204, S2000, Adabase, etc, etc, etc., that are ALL in active use.

Likewise, most are accessible via one of the following: ODBC, JDBC, ADO.NET, OLE-DB, JDO, SOA Web Services, SOAP Services, RESTful interaction patterns e.g., via HTTP etc.. Once the data is accessible it becomes much more malleable etc..

Kingsley

- David



 
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
 


-- 

Regards,

Kingsley Idehen	      
Founder & CEO 
OpenLink Software     
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen




Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>