Joel - (01)
On Jan 4, 2010, at 5:22 PM, Joel Bender wrote: (02)
>> For this (mythical?) ontology beast:
>> - it exists in some form so accessible/useful that people who
>> know (and probably care) nothing about ontologies will find it &
>> use it with minimal training... some motivated users going so far
>> as to helping to correct & extend it
>> - the ontology(s) become self sustaining
>
> A laudable goal for an open ontology repository. Drawing on an
> analogy with Wikipedia, what questions would you ask someone who
> wants the authority to edit a page? What slices of the OBoK would
> you expect editors/moderators to have? (03)
I most vociferously object to your promiscuous use of "repository." (04)
[RANT=ON] (05)
I will assume that you do not mean: repository = database, which is
clearly how many people use "repository" these days. They've been
using "database" for a long time & "repository" sounds more
sophisticated. Bletch!!!! (06)
First the definitional, contextual details. (07)
"data dictionary = metadata repository" in my professional
experience. I acknowledge that many people refer to a list of data
elements as a "data dictionary." That is NOT how I use the term
"data dictionary." (08)
When I formally present on this topic I lead with a slide that says:
"Definition: Repository = burial place" (09)
The Oxford American Dictionary offers: "a place, building, or
receptacle where things are or may be stored : a deep repository for
nuclear waste." Notice it offers nothing in terms of getting
something back OUT of said repository. (010)
The greater context here is (software) information systems
development & maintenance in large (Fortune 500 scale) organizations
in the US, Canada & UK. Such organizations all have huge, sprawling
software portfolios that are constantly evolving (mutating?) mix of
packages, custom written & acquired via buying other companies.
These are the only organizations that will have the recognized need &
resources to chase the ontology rabbit. (011)
We've all worked for and experienced such organizations. A
distinguishing characteristic is they're S-L-O-W to change. A major
reason they're so slow is that no one understands how information
flows through & across the systems with a net, entirely predictable
result that it takes a loooooong time to make even small changes.
As an example: there was an article in ComputerWorld towards the end
of 2009 that said IBM has 4,000 (4,500?) "applications." No
definition was offered for "application." Personally I assume (with
absolutely no direct supporting evidence) that an application is
comprised of a collection of (probably) many systems. (012)
Data dictionary products have been with us since the dawn of database
engines in the early 1970s. Their intent was to provide accurate
documentation on how the gazillions of interconnected pieces in a
software portfolio are related to each other. If organizations had
accurate, complete data dictionaries in place in the 1990s, Y2K would
have been a total non-event. (013)
The harsh market reality is that data dictionaries over their 35+
life span in the US market have had (at best) a 5% success rate. (014)
As I'm sure John Sowa can attest, after the fiasco of FS (Future
System), IBM had a last run at the "enterprise metadata
repository" (again, it's the same thing as a data dictionary) with
the disastrous AD/Cycle effort in the late 1980s. At the center of
the AD/Cycle effort was RepositoryManager (affectionately called
RepoMan), comprised of some 1,700 DB2 tables. Truly gave new meaning
to "putting out the lights." (015)
A very short overview of what it takes to have a successful data
dictionary effort is here... http://www.tdan.com/view-articles/
6123 Notice that this organization has a controlled vocabulary of
about 1,500 terms. If I had a death wish I could ask the Marine who
put this together if he was thinking about an ontology when he did
this, but I think I'll refrain. (016)
There are multiple external issues that must be addressed (e.g.
AUTOMATED) to make a data dictionary self sustaining. I have had
discussions with organizations that thought they had a successful
effort yet were totally puzzled as to why the dictionary needed to be
integrated into the software configuration management process. The
key person leaves & the whole thing collapses after 20 years of success. (017)
My point here is to object to your far too casual use of the word
"repository." There are currently no such capable tools on the
market. And even if such "repository" tools existed, organizations
wouldn't know how to weave them into their process. (018)
A major hurdle being (crude example): when someone out in the
organizational boonies, discovers a "new" term, the process has to
push back very rapidly... "I'm sorry Dave, we don't think Postal Code
is a new thing... please examine this collection of Zip Code
artifacts..." If it takes a week to respond, the whole effort is a
waste. (019)
None of this background begins to touch on the semantic/ontology
issue. No data dictionary products attempt(ed) to address the
semantic challenge. The tool always assumed the human knew what they
were doing. When the semantic/language/vocabulary/ontology issue is
ignored, people will not be able to find what they need and they will
quickly & for all time write off said dictionary/repository as a
waste of time. (020)
Things are actually worse now that we have Google. Type in a word
(maybe two if you dare push the bleeding edge?) & you get 10,000,000
hits to choose from. This approach does not work inside an
organization. First you have to KNOW what you're looking for. If
you don't already know that SSN is a likely acronym for "social
security number" you're simply out of luck. The fact that SSN also
means many other things than "social security number" is just another
wrinkle. (021)
So... what I really wanted to say is, please do not just blithely
assume that once we've found/built/created/hallucinated our ontology,
we can just throw it in one of these mythical "repository" things. (022)
[RANT=OFF] (023)
re: going the Wikipedia route for the OBoK... (024)
I would start with the 90-9-1 assumption... in an online community of
practice, 90% of people silently lurk (read only), 9% contribute
occasionally, and 1% are the lifeblood. (025)
Rather than worry about people munging up the place with random/bugus
edits, I'd tightly control the write ability to only known good
players. And make it easy for someone who's interested in being an
active contributor to petition for editing rights. (026)
Not to forget... 99.999% of the people in an organization will have
zero interest in the finer points of an ontology (matter of fact
they'll likely be pissed at you for using a big word that you never
adequately define in order to intimidate them)... what they want is
an easily accessible resource where they can find useful
information. A major ease-of-use challenge will be to package said
ontology in a non-threatening/intimidating way for the audience who
has zero interest in the finer points of ontologies. (027)
YOU many be fascinated with the nuances of language... your potential
audience/customers has zero interest. (028)
Think what would happen if you told someone who wanted to learn to
drive a car.... "We'll start with learning how the transmission
works... then we'll move on to how valve faces are machined..." NO
customers. (029)
Just my two cents... aren't you glad you asked? (030)
___________________
David Eddy
deddy@xxxxxxxxxxxxx (031)
781-455-0949 (032)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (033)
|