ontology-summit
[Top] [All Lists]

Re: [ontology-summit] [ReusableContent] Partitioning the problem

To: Ontology Summit 2014 discussion <ontology-summit@xxxxxxxxxxxxxxxx>
From: Cory Casanave <cory-c@xxxxxxxxxxxxxxx>
Date: Mon, 3 Feb 2014 03:04:24 +0000
Message-id: <049ffd52db6d4541a3fe4920ea7df6cc@xxxxxxxxxxxxxxxxxxxxxxxxxx>

Amanda,

Re: I understand, but I think it is mostly a tooling problem

 

Tooling has its limits, complexity has a way of poking through.  Consider 2 scenarios:

 

Tool  1 presents a simplified abstraction that addresses certain stakeholders needs. That tool maps to an underlying language (let’s use OWL as an example just to annoy John). Therefor tool-1 is not OWL. If you are “forward engineering” from Tool-1 to OWL this can be effective, once Tool-1 is required to import existing OWL it has to expand all the potential complexity when its simplified patterns are not recognized. Tool-1 implements a different language. If there is a better (simpler) language, why don’t we use that one? Do we understand all of the relationships between the Tool-1 language and OWL?

 

Tool 2 is a “UI” for the language, it cleans up presentation and perhaps offers some wizards and shortcuts (e.g. Protégé or XML Spy) . Tool-2 can be very effective for those who know OWL but then requires an understanding of OWL and the unique way Tool-2 presents it. We have these kinds of tools for many Ontological languages, but still seem to have issues.

 

One of the factors is that we are dealing with languages not designed for tooling, they are designed for experts with text editors. Ontological languages typically sacrifice simplicity and fitness for a specific purpose for generality, inference efficiency and minimalism. They are not user focused languages (in general, I’m sure there are exceptions).

 

Consider 2 examples:

The naming discussion: In the OMG style of modeling languages (UML, BPMN. etc)  there are machine identifiers as well as human names, but these identifiers are explicitly for the tools use and never shown to the users. Since in OWL (again, for example), humans can make up these identifiers and MUST be able to see them in the textual language – so there is confusion and inconsistency. Since the abstract syntax of the language is tightly tied to the human usable _expression_, there is no way out. A language designed for tools expects an intermediary between the exchange syntax and the user and can hide things that are intended (by the language design) to be hidden.

 

Another example is what UML calls multiplicity and the representation of this idea on OWL. Saying 1..* on a relation is easy to grasp, subclassing a restriction seems a complex way to express a simple idea. OWL is built for the inference engine, not the user. If I ‘Hide” this in a tool and there is some unexpected pattern of using restrictions, I may not be able to properly “round trip” the concept.

 

I am not suggesting UML as the gold standard here, it has its own horrendous complexes. But it does highlight the difference between languages designed for tooling and those designed for direct use of their exchange syntax. Note that the tooling approach can still support textual syntaxes, but they tend to be a layer over what the machines interchange.

 

So are tools important: YES. But don’t look to tools to solve inherent language complexities.

 

Regards,

Cory Casanave

 

From: ontology-summit-bounces@xxxxxxxxxxxxxxxx [mailto:ontology-summit-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Amanda Vizedom
Sent: Saturday, February 01, 2014 6:59 PM
To: Ontology Summit 2014 discussion
Subject: Re: [ontology-summit] [ReusableContent] Partitioning the problem

 

 

 

On Sat, Feb 1, 2014 at 6:01 PM, Ali SH <asaegyn+out@xxxxxxxxx> wrote:

Dear Amanda, Kingsley and David,

On Sat, Feb 1, 2014 at 3:04 PM, Amanda Vizedom <amanda.vizedom@xxxxxxxxx> wrote:

 

Your proposed solution - as best I can tell, to choose one target set of humans and make the (meant for machine consumption) URIs (or even names!) understandable to them, while ignoring the polysemy-tolerant, built-for-natural-language labeling features of the ontology language, is inherently antithetical to reuse (including use over time). 

 

I don't believe David is saying this. I sympathize with his conundrum. He isn't saying that the human readable URI's are intended to exactly denote the semantics of what is represented in the ontology.  Rather, that people who are using these URI's to build applications, in the form of code or queries riding on top of the ontologies have more difficulty if they are anchored in a completely opaque naming system.

 

In my experience, that just isn't true. 

 

Ignore that examples of really long and confusing identifiers have been thrown around, here. Much shorter and simpler character strings can be used for IDs within an ontology. Sure, use namespaces or other mechanism to localize to the particular ontology (or microtheory, or ...?); that's great. 6 hexadecimal char strings, for example, are well within the capability of most coders to compare. I am relatively poor at number and non-word recall, and I found one such system, quite large, to be easy to work with. Did I memorize what concepts each of these strings corresponded to? No; whether I was working on the ontology directly, browsing it, looking stuff up in it, querying, developing pattern-matching code that used the ontology, debugging weird test results from and indexing run, or what have you, the hex code ID could (usually was, by default) shown *with* a pref label for my language. Folks working in extending the French lexicalization or doing QA testing for a francophone localization could have the default show pref label in (fr) or some localization thereof, for example. So I might see 4G61XS (dog) and Claude might see 4G61XS (chien) in the indexer results or while browsing the ontology. If we were developing rules or tests and couldn't remember the name of the concept we wanted was, I could search on "dog" and compare the returns (multiple, since labels aren't unique) to find the right one, and Claude could do the same searching on "chien." Both of us would be reminded and motivated to check the other "dog"/"chien" matches.  In my experience, that apparent burden in fact results in a greater efficiency and accuracy; without that check, and with a suggestive name or label-only view, the rate at which people guess or assume and use the wrong one is high enough to cause a lot of extra work.

 

His example with the SPARQL queries is spot on, and something I've run into as well. When queries are written using completely opaque URI's, the task of maintaining, debugging and updating them is significantly complicated, leading to more opportunities for errors.

 

I understand, but I think it is mostly a tooling problem. The tools do not use the appropriate formal language features. Humans shouldn't be writing or debugging SPARQL queries with only the concept ID visible, whether it is opaque or suggestive. Either way, there is extra lookup (out of the cognitive task space) and a greater likelihood of error than is really tenable. Unfortunately, that is mostly the state of the art in open/COTS tools, but the way to fix it isn't to make the IDs more suggestive (and conducive to error); it's to make the tools use the human-oriented features of the language when interfacing with humans. BTW, I specified state of the art in *COTS* tools, because I've seen a number of proprietary tools, developed for use within an company only, that don't make this same error. I'm perpetually frustrated that we don't have the same level of tooling in the open-source or COTS worlds. But it is not a coincidence that the companies in question have done well in developing semantic enterprise or web systems with those ontologies as components. They take their ontologies, and the processes concerning them, rather seriously. 

 

If I've understood David's point correctly -- the same way that software developers employ useful NL analogues for the variable / class names to make the code more readable, ontologists should consider using similarly somewhat accessible labels. As someone who has had to debug SPARQL queries written using esoteric naming systems, the fact that those terms had "pref-labels" in a multitude of language did not help one iota. I had to constantly look up what the term referred, and it increased the debugging time by perhaps an order of magnitude.

As I suggested in a previous email, there's a balance to be struck, since a pure linguistic ID can indeed lead to unintended or hopeful semantics. But something like:

human.n.05


is readable to a human, and also clearly not intended to be interpreted naiively. One can still use labels (a la SKOS) to display different terms (e.g. homme) when presenting such concepts to SME's or other targeted audiences, but when one is building applications using the ontology identifiers, having something like human.n.05 vs RD54383 is much easier to follow the logic and debug.

 

That's in between, I think. You will still have to look up which human-related concept that is, and to Claude or someone else they may be equally opaque. I still don't see the advantage over having an IDE, or parts thereof that (a) shows you a prefLabel along with ID, according to your settings. 

 

As Simon and Ed alluded to, our brains have developed ways for holding various referents in our heads. We detect and utilize name patterns based on the shape and length of words. When the naming system follows an esoteric style, we don't have the ability to use these facilities, leading to potential errors and slower work.

 

True, but it is still the case that intelligible to some is opaque to others, and that suggestive often means giving rise to misuse. OWL has the built-in capabilities to give us precision and developer-appropriate language suggestions together. It's perfectly feasible to build efficient development tools that do so; one can even get a little fancier and connect tools to ontology to allow, for example, using the language part to look at alternatives without leaving the cognitive space. I know this is possible because I've used them. But the majority  of tools, and all the open or COTS ones I know of, just haven't had this kind of Human/cognitive interface attention given to them. 

 

I just don't think the solution is to treat the ontology language as more impoverished than it really is. We know there is far to go in improving tools, anyway. I'd say that one of the improvements should be to make tools that use the existing support for co-existing human-readability and machine-uniqueness.

 

Amanda

 

 

 


--


(•`'·.¸(`'·.¸(•)¸.·'´)¸.·'´•) .,.,



_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014
Community Portal: http://ontolog.cim3.net/wiki/

 


_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014  
Community Portal: http://ontolog.cim3.net/wiki/     (01)
<Prev in Thread] Current Thread [Next in Thread>