ontology-summit
[Top] [All Lists]

Re: [ontology-summit] [ReusableContent] Partitioning the problem

To: Ontology Summit 2014 discussion <ontology-summit@xxxxxxxxxxxxxxxx>
From: Eric Prud'hommeaux <eric@xxxxxx>
Date: Sun, 2 Feb 2014 09:41:05 -0500
Message-id: <20140202144101.GA17435@xxxxxx>
* Amanda Vizedom <amanda.vizedom@xxxxxxxxx> [2014-02-01 18:59-0500]
> On Sat, Feb 1, 2014 at 6:01 PM, Ali SH <asaegyn+out@xxxxxxxxx> wrote:
> 
> > Dear Amanda, Kingsley and David,
> >
> > On Sat, Feb 1, 2014 at 3:04 PM, Amanda Vizedom 
><amanda.vizedom@xxxxxxxxx>wrote:
> >
> >>
> >> Your proposed solution - as best I can tell, to choose one target set of
> >> humans and make the (meant for machine consumption) URIs (or even names!)
> >> understandable to them, while ignoring the polysemy-tolerant,
> >> built-for-natural-language labeling features of the ontology language, is
> >> inherently antithetical to reuse (including use over time).
> >>
> >
> > I don't believe David is saying this. I sympathize with his conundrum. He
> > isn't saying that the human readable URI's are intended to exactly denote
> > the semantics of what is represented in the ontology.  Rather, that people
> > who are using these URI's to build applications, in the form of code or
> > queries riding on top of the ontologies have more difficulty if they are
> > anchored in a completely opaque naming system.
> >
> 
> In my experience, that just isn't true.
> 
> Ignore that examples of really long and confusing identifiers have been
> thrown around, here. Much shorter and simpler character strings can be used
> for IDs within an ontology. Sure, use namespaces or other mechanism to
> localize to the particular ontology (or microtheory, or ...?); that's
> great. 6 hexadecimal char strings, for example, are well within the
> capability of most coders to compare. I am relatively poor at number and
> non-word recall, and I found one such system, quite large, to be easy to
> work with. Did I memorize what concepts each of these strings corresponded
> to? No; whether I was working on the ontology directly, browsing it,
> looking stuff up in it, querying, developing pattern-matching code that
> used the ontology, debugging weird test results from and indexing run, or
> what have you, the hex code ID could (usually was, by default) shown *with*
> a pref label for my language. Folks working in extending the French
> lexicalization or doing QA testing for a francophone localization could
> have the default show pref label in (fr) or some localization thereof, for
> example. So I might see 4G61XS (dog) and Claude might see 4G61XS (chien) in
> the indexer results or while browsing the ontology. If we were developing
> rules or tests and couldn't remember the name of the concept we wanted was,
> I could search on "dog" and compare the returns (multiple, since labels
> aren't unique) to find the right one, and Claude could do the same
> searching on "chien." Both of us would be reminded and motivated to check
> the other "dog"/"chien" matches.  In my experience, that apparent burden in
> fact results in a greater efficiency and accuracy; without that check, and
> with a suggestive name or label-only view, the rate at which people guess
> or assume and use the wrong one is high enough to cause a lot of extra work.    (01)

Ideally, someone will have the resources to compare development and
deployment of some domain ontology with both opaque and e.g.
English-intuitive IRIs and publish it as a Gartner business case.
We'd want to explore development time, uptake, adoption time, and some
quantized assessment of the cost of the errors Amanda aludes to above.
(At the same time, we'd probably want to study the ROI on binding to
upper-level ontologies like BFO, DOLCE, UMBEL, Cyc...)    (02)


> > His example with the SPARQL queries is spot on, and something I've run
> > into as well. When queries are written using completely opaque URI's, the
> > task of maintaining, debugging and updating them is significantly
> > complicated, leading to more opportunities for errors.
> >
> 
> I understand, but I think it is mostly a tooling problem. The tools do not
> use the appropriate formal language features. Humans shouldn't be writing
> or debugging SPARQL queries with only the concept ID visible, whether it is
> opaque or suggestive. Either way, there is extra lookup (out of the
> cognitive task space) and a greater likelihood of error than is really
> tenable. Unfortunately, that is mostly the state of the art in open/COTS
> tools, but the way to fix it isn't to make the IDs more suggestive (and
> conducive to error); it's to make the tools use the human-oriented features
> of the language when interfacing with humans. BTW, I specified state of the
> art in *COTS* tools, because I've seen a number of proprietary tools,
> developed for use within an company only, that don't make this same error.
> I'm perpetually frustrated that we don't have the same level of tooling in
> the open-source or COTS worlds. But it is not a coincidence that the
> companies in question have done well in developing semantic enterprise or
> web systems with those ontologies as components. They take their
> ontologies, and the processes concerning them, rather seriously.    (03)

I agree, it's a tooling problem, but I think it's a tooling problem
that we have to take into account in our design. In the short term, we
face choices like favoring adoption by using intuitive labels
vs. using opaque identifiers and thereby pressuring tool and mindset
development.    (04)

I note that 50 years has seem the development of some tools to enable
GUI assembly of SQL queries, but that a query for "SELECT FROM ORDER
BY" doesn't reveal evidence of many machine-generated queries (some
from language-specific bindings like Ruby on Rails). I suspect that
the bulk of SQL out there is embedded in quotes in other programming
languages and that SPARQL would be as well for the foreseeable future.
Protégé's ability to customize the interface to use e.g. rdfs:label
enables one to create ontologies with opaque idenfitiers but doesn't
do anything to favor their adoption.    (05)


> > If I've understood David's point correctly -- the same way that software
> > developers employ useful NL *analogues* for the variable / class names to
> > make the code more readable, ontologists should consider using similarly
> > somewhat accessible labels. As someone who has had to debug SPARQL queries
> > written using esoteric naming systems, the fact that those terms had
> > "pref-labels" in a multitude of language did not help one iota. I had to
> > constantly look up what the term referred, and it increased the debugging
> > time by perhaps an order of magnitude.
> >
> > As I suggested in a previous email, there's a balance to be struck, since
> > a pure linguistic ID can indeed lead to unintended or hopeful semantics.
> > But something like:
> >
> > human.n.05
> >
> > is readable to a human, and also clearly not intended to be interpreted
> > naiively. One can *still* use labels (a la SKOS) to display different
> > terms (e.g. *homme*) when presenting such concepts to SME's or other
> > targeted audiences, but when one is building applications using the
> > ontology identifiers, having something like human.n.05 vs RD54383 is much
> > easier to follow the logic and debug.
> >
> 
> That's in between, I think. You will still have to look up which
> human-related concept that is, and to Claude or someone else they may be
> equally opaque. I still don't see the advantage over having an IDE, or
> parts thereof that (a) shows you a prefLabel along with ID, according to
> your settings.
> 
> 
> > As Simon and Ed alluded to, our brains have developed ways for holding
> > various referents in our heads. We detect and utilize name patterns based
> > on the shape and length of words. When the naming system follows an
> > esoteric style, we don't have the ability to use these facilities, leading
> > to potential errors and slower work.
> >
> 
> True, but it is still the case that intelligible to some is opaque to
> others, and that suggestive often means giving rise to misuse. OWL has the
> built-in capabilities to give us precision and developer-appropriate
> language suggestions together. It's perfectly feasible to build efficient
> development tools that do so; one can even get a little fancier and connect
> tools to ontology to allow, for example, using the language part to look at
> alternatives without leaving the cognitive space. I know this is possible
> because I've used them. But the majority  of tools, and all the open or
> COTS ones I know of, just haven't had this kind of Human/cognitive
> interface attention given to them.    (06)

I'm quite interested in your experience with these tools and whether
they represent something that would uniformly divorce us from reading
and typing IRIs. I hand-write a lot of SPARQL queries, but I would be
happy to use a preprocessor which converted the qnames in a SPARQL
query from (prefix ':' rdfs label) to the actual ontology IRIs, and
reversed this process for query results. Likewise, I'd be happy to
edit e.g. Turtle/Trig that way. The cost would be that this extra tool
or interface gizmo would have to be ubiquitous enough that I wouldn't
just avoid ontologies with opaque identifiers.    (07)


> I just don't think the solution is to treat the ontology language as more
> impoverished than it really is. We know there is far to go in improving
> tools, anyway. I'd say that one of the improvements should be to make tools
> that use the existing support for co-existing human-readability and
> machine-uniqueness.    (08)

Despite my skepticism above, I'm certainly happy to be proven wrong. I
don't mean to shout "long live textedit" from the mountain tops, but I
also don't want to impede the adoption of RDF in industry.    (09)


> Amanda
> 
> 
> 
> 
> >
> > --
> >
> >
> > (•`'·.¸(`'·.¸(•)¸.·'´)¸.·'´•) .,.,
> >
> >
> > _________________________________________________________________
> > Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/
> > Subscribe/Config:
> > http://ontolog.cim3.net/mailman/listinfo/ontology-summit/
> > Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
> > Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
> > Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014
> > Community Portal: http://ontolog.cim3.net/wiki/
> >
> >    (010)

>  
> _________________________________________________________________
> Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
> Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
> Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
> Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
> Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014  
> Community Portal: http://ontolog.cim3.net/wiki/     (011)


-- 
-ericP    (012)

office: +1.617.599.3509
mobile: +33.6.80.80.35.59    (013)

(eric@xxxxxx)
Feel free to forward this message to any list for any purpose other than
email address distribution.    (014)

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.    (015)

_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014  
Community Portal: http://ontolog.cim3.net/wiki/     (016)
<Prev in Thread] Current Thread [Next in Thread>