Leo, Amanda, Ed, Doug, and Pat C, (01)
I agree with all your hopes, fears, and observations. (02)
Unfortunately, every system that allows humans to enter data
and read the results must face the fact that nobody will read,
understand, remember, and use the definitions correctly and
consistently. Even the people who write the definitions don't
always use their own definitions consistently. (03)
Leo
> ... people will fight to the death to include their "words",
> mistaking these for the concepts behind them. (04)
That's true, but understandable. Since most people never read the
definitions, the words will have more influence. (05)
AV
> I have also experienced the significant gains in usability and
> efficiency that can come from using concept IDs that are not easily
> interpreted by humans (e.g., hexadecimals at Convera). IMHO, that is
> the way to go -- it's amazing how much confusion is avoided... (06)
EB
> Yes, if you can get the community to maintain the discipline... (07)
I agree with Ed's caveat. That kind of discipline can be maintained
with a small group of highly motivated developers. But it is extremely
hard to continue it as the group expands and they have to train new
hires and customers. (08)
EB
> They pronounce the codes, which mean nothing to the accountant
> or the man on the dock. (09)
Yes. How many people who have a 401K plan have read, understood,
and remembered the definition? Anybody who learns those codes just
uses them in the same way that they use any other words or phrases. (010)
DF
> You can find lots of terms in OpenCyc whose names could be interpreted
> in multiple ways, yet whose #$comments do little more than restate the name. (011)
That is also true of 99% of the published OWL ontologies. In fact, most
of the OWL ontologies are grossly underspecified. The only so-called
"definitions" are English comments that the computer ignores. (012)
DF
> Many ontologies (e.g., the OBO ontologies) get around this by using
> numeric strings as IDs of the ontology terms, forcing people to look
> at various comments and alternate names to understand what is
> intended. (013)
That practice began long before OBO. SNOMED used 4-digit ids with
each digit as a key to some branch of their ontology. As they grew,
they added more digits. But encoding meaning in the digits of an
identifier is a bad practice that computer scientists have been
warning against for years. (014)
In practice, most of the readable terms in OBO are univocal: drugs,
biological species, diseases, medical instruments and procedures.
The string 22298006 is no more precise than 'myocardial infarction'. (015)
But note Ed's observation. Unreadable codes don't force anybody to
read the definitions. People still start with the glosses, and
rarely study the formal definitions. (016)
AV
> I have also experienced the significant gains in usability and
> efficiency that can come from using concept IDs that are not easily
> interpreted by humans (e.g., hexadecimals at Convera). (017)
Specialists in every branch of science, engineering, medicine, and
the arts have developed precisely defined terminologies that are
unknown to the unwashed. It's possible (but not easy) to select
univocal phrases. (018)
PC
> So, instead of, e.g. a term “Process” (never defined the same way in
> any two upper ontologies I have seen), we might have “ContinuousProcess”
> for phenomena describable by differential equations, or “DiscontinuousProcess”
> for an Event represented as a series of steps (which may be considered
> instantaneous, or take some finite time, which distinction may lead to more
> refinement or expansion of the labels). When more than one ontology is
> being considered, the formal way of doing this is just to use namespace
> prefixes so everyone can use the same label, but specify the namespace
> when creating the logical specification. (019)
I agree. Martin Hepp put a lot of effort into choosing meaningful and
readable names for his GoodRelations ontology. That is one reason why
it became popular. And that's also why Google, Microsoft, Yahoo,
and Yandex adopted it for Schema.org. (020)
Note that Schema.org uses readable English terms (mostly multi-word)
for their ontology. And note that the primary Google spokesperson
for Schema.org is Guha -- who had been the associate director of Cyc,
the chief designer of RDF, and the co-author (with Pat Hayes) of
the logic base (LBase) for RDF, which uses a version of the same
semantics as Common Logic. So he's familiar with the issues. (021)
In Summary, I agree that there is a problem with people reading just
the names instead of looking at the definitions. But making the names
unreadable is not a solution. I don't agree with Guha about everything,
but I think that he (and the other Schemers) made some good choices
for Schema.org. (022)
John (023)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (024)
|