On Feb 12, 2014, at 10:59 AM, Kingsley Idehen wrote:
What characteristics distinguish a Data Dictionary, Ontology, and Vocabulary?
Oh, boy!
There will be a lot of variety here. As always a journey of history. I will get to OntVoc.
Note: I regard "data dictionary" & "metadata repository" as very close if not identical synonyms. Others do not. There can be a tremendous variety how these tools are used. Six men & the elephant & all that.
As I hope we've all noticed "repository" which used to be a very special word, is now morphing to mean "database."
For me a data dictionary is a very specific software product.
The story as I know it, begins in the mid 1960s—I believe Cincom & Adabase were available, IMS appeared in 1969—when it was observed that these database things were clearly going to be so complex & have so many artifacts that it would clearly be necessary to have a specialized bill-of-matierals database to keep track of the actual DBMS applications, their artifacts & relationships.
For a while data dictionary was very popular as application construction in the 1970s & 1980s exploded (always nice to have the advantage of green fields). Unfortunately over the past 49 years the survival rate, however, has been under 5%. Many have tried, few have succeeded. And systems continue to get ever more complex & unwieldy.
Data dictionary was relabeled to metadata repository with IBM's short lived AD/Cycle, RepositoryManager effort in late 1980s. Blink & you'd miss it.
The classic—inverted list DBMS—data dictionary market essentially ceased to exist (I'm only speaking of Fortune 500 with 1,000s of applications) by early 1990s. There was a brief popularly of repositories done in RDBMS through the late 1990s... also pretty much gone to dust. A data dictionary that supports a single RDBMS will live an isolated life which is out of sync with it's purpose as a bill-of-materials database. To a good data dictionary a VSAM file declaration & a DB2 table schema (or Oracle) are very much the same thing. Just different packaging.
Basic structure of a dictionary:
Enterprise is comprised of Applications.
Applications are comprised of Programs.
Programs are comprised of Data Structures, Linkages, & Logic.
Data Structures contain Data.
Data contains Data Definitions.
Certainly plenty of room for quibbling & clarification of definitions here.
In the context of current expectations of what an ontology is or is not, I have absolutely no idea where or how or if an ontology would be woven into a data dictionary. It's rather a moot point, of course, since for all practical purposes ENTERPRISE scale data dictionaries simply do not exist. [It is my understanding there are application focused data dictionaries, but I'm not looking at those. Small company with 10,000 people has 1,000 applications with 1,000 data dictionaries...? What's the point?]
Vocabulary, naming standards, controlled language, however SHOULD be—typically were not—an integral part of the dictionary & how it's used to support systems in the organization.
My abiding interest in this particular windmill was fired when I worked at an insurance company with a well populated data dictionary... and in that population process, they'd found 70 different names for the core business concept "policy number." Ouch. Phenomenal waste of time when asking questions or making changes.
A well populated dictionary helps you see that when you ask for M0101, you'll also get MSTR-POL-NO, contract_id, etc. [Oh, dear! I'm repeating myself.]
Most folks who attempted to use data dictionaries never got around to the naming standards, vocabulary control, controlled vocabulary aspect. The URL case at the end notes this success story did indeed control terms down to a list of about 1500.
Basic value here being... just because someone, be it manager, business analyst, project manager or programmer comes up with what appears to be a "new" word doesn't mean the concept isn't already there. The basic dictionary, vocabulary control attitude has to be: "I bet we've already got that." "Are you bringing something totally new to our business to the table?" Highly unlikely. That said, terminology does change. Old words die. New words come into use. But slowly.
They solve the Zip Code vs Postal Code battle, by calling it POSTALZIP-CD.
I doubt if this is enough of a foundation to reason across, but it really helps control redundancy.
The very, very few successful dictionary efforts I've encountered were all quite similar. They focused on operational facets of the organization. If what the dictionary does helps the operator on grave yard shift, then you've built something with legs.
Attempts to build the perfect enterprise data model never seemed to have sustainability. When the champion leaves, the dictionary/modeling effort falls apart. But this is another loooooooong story.
Encountered something better than this 30+ year success story? Please let me know.
Aren't you glad you asked?