Re: [ontolog-forum] What goes into a Lexicon?

To:	"[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>
From:	"Obrst, Leo J." <lobrst@xxxxxxxxx>
Date:	Mon, 27 Feb 2012 19:16:51 +0000
Message-id:	<FDFBC56B2482EE48850DB651ADF7FEB018286621@xxxxxxxxxxxxxxxxxx>

Sure, and we’ve used that approach in the past too. It really depends on both the tools and the ontology developers’ knowledge of English. And of course it helps if you eventually have vocabularies you can map to that.

Thanks,

Leo

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Amanda Vizedom
Sent: Monday, February 27, 2012 12:04 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] What goes into a Lexicon?

We're in agreement about all of that. The alternative I prefer, though, is different than all of those. I admit, it's difficult to imagine if you haven't worked in an ontology environment that uses it, but if you have, it's head-slappingly better.

I take it that your third example is meant to be the opaque names case. And that's surely bad, if you have opaque names and don't have or use lexical information/labels. But the case I'm arguing for isn't that one; it's the case in which you *always* include at least some lexical information/labels, which are then extensible as you add mappings; it's also the case in which the tools *use* the lexical/label information in display.

So you would never see "Onto1:Z" unless you had specifically overridden some default display prefs to hide all labels.

Say that I am viewing the ontology. My global profile includes a localization to "en-us", let's say the ontology development / management / viewing tools pick that up.

Now, what's actually in Onto1, if it has had only en-us lexicalization, might be something like this:

Onto1:Z3X2Z

preferred label:"truck" (en-us,n);

subclassOf:Onto1:4SEG2 ("vehicle" (en-us, n));

and also, somewhere in there, perhaps

Onto1:4Y1DF

preferred label:"transport by truck" (en-us, v);

label:"truck" (en-us,v);
subclassOf:Onto1:5K8IU ("transport" (en-us,v);

Great for me, since it has my language. The tools default to showing labels according to my language prefs,

so when I go to your mapping problem, what I see might be:

(ajv1a)

Vocab1: Lorry --->

Onto1:Z3X2Z (truck)

Vocab2: Semi --->

(ajv1b)

Vocab1: Lorry --->

Onto1:Z3X2Z (truck)(en-us, n)

Vocab2: Semi --->

That's great for me since the lexical info is in my language, as easy to work with as your first example,

(lo1)

Vocab1: Lorry --->

Onto1: Truck

Vocab2: Semi --->

And a little bit less error prone, since even in case (ajv1a), where the part of speech info isn't shown, by displaying

the label and ID separately, the tool continually checks my tendency to trust labels too much, and to read into them.

So at least, I'm kept aware that someone labeled this truck, but there is more info available to check, to make sure

it's the right concept. Version (ajv1b), including display of lexical info including part of speech, is even better,

since it tells me that I've got the right sense of "truck", not the verb or something from a different language.

[For less error, I think we'd agree that at least the some subclass info might be shown, too, but that's tangential to the

point about names and labels, so I leave it out] .

Now, say one of our British colleagues is working with the vocab. There's no lexical info to match their localization

preferences, so smart tools will not give them option (ajv1a) -- or (lo1) -- by default. This display is error-promoting

since it gives labels in a non-native localization without making that explicit. So, even if no en-uk lexical info is

available, our British colleague might be shown

(ajv1b)

Vocab1: Lorry --->

Onto1:Z3X2Z (truck)(en-us, n)

Vocab2: Semi --->

Again, they get a meaningful label, so there's no naked ID problem. Better, they get the info that the label provided is

in US English, and is "truck" as Americans use it.
...

Now, let me take you a little further down the road, to point out how much more useful this approach gets when the ontology is used for multi-lingual content and, or user. Say that some lexical info has been included, or added later, for other languages and localizations. Now, Onto1 includes

Onto1:Z3X2Z

preferred label:"truck" (en-us,n);

preferred label:"lorry" (en-uk, n);

preferred label:"camion" (fr, n);

preferred label:"camión"(es, n);

subclassOf:Onto1:4SEG2 ("vehicle" (en-us, n));

And also, somewhere in there

Onto1:4Y1DF

preferred label:"transport" (en-us, v);

label:"truck" (en-us,v);

preferred label:"transport par camion" (fr, v);

label:"routier" (fr, v);

preferred label:"llevar" (es, v);

label:"transportar en camión" (es, v);

subclassOf:Onto1:5K8IU ("transport" (en-us,v);

[Disclaimer: this is example lexical info for illustration, not claimed to be correct.]

Now our British colleague's localization prefs can be satisfied, and the tools can automatically show:

(ajv2a)

Vocab1: Lorry --->

Onto1:Z3X2Z (lorry)(en-uk, n)

Vocab2: Semi --->

Didn't that mapping job just get a easier, faster, an more accurate?

I've put these examples in terms of natural languages, but I don't think that's limiting. The same kind of localization can be used for domains and functional areas. For example, were I using this approach back in the USAF effort, where ontologies are developed by specially authorized communities of interest (COIs), I would have liked to be able to specify their dialects: e.g., usaf-log for a broad logistics COI, usaf-med for medical one, usaf-gen for AF wide usages not specific to a COI (with these as with standard natural languages, more localization specific usages could optionally override broader, according to user prefs and profiles. Then I could do very helpful things like say (assume that OntoN and OntoM will be aligned and used by a single system, so they will be encountered by the same users and may be use together to support discovery, etc, even though in this case I'm treating them as developed separately):

OntoN:4J8C2

preferred label:"equipment health"(usaf-gen)

preferred label:"health"(usaf-log)

and

OntoM:8K0L3

preferred label:"medical health"(usaf-gen)

label:"personnel health"(usaf-gen)

preferred label:"health"(usaf-med)

And this way, folks in the med domain can see and use their intuitive local shortenings where appropriate, an unambiguous label for the more general context is available, specialists can stay in specialist-thinking mode when they are working with, reviewing, etc, the ontology, while general understandability is maintained.

And, not unimportantly, when either one of these ontologies is submitted for review, *nothing* is put out for approval with the name

(ajv3a)

OntoN:Health

OntoM:Health

and this avoids an immense amount of time otherwise wasted as people who don't understand the tech details but represent their areas in coordinating / approving admission of local ontologies to the enterprise ontology pool fight over what "Health" *really* is and who gets to define it.

Default display for the more general context,

OntoN:4J8C2 (equipment health) (usaf-gen)

and

OntoM:8K0L3 (medical health) (usaf-gen)

avoids that, in committee or in off-line review by someone else later who would otherwise misunderstand and complain. That would be no small improvement.

So, is all this worth emphasizing the distinction between lexical and conceptual elements? Well, it probably depends on the usage. For federation, search and retrieval, and similar usages, it's probably a matter of complexity, scale, and expected life-cycle. The more languages, dialects, jargons, activities, domains, etc. are included in the user base and/or the source info extends into, and the longer the system is expected to live (that is, the more changes and extensions it will likely need to support), the more it's worth it.

Best,

Amanda

On Mon, Feb 27, 2012 at 10:13, Obrst, Leo J. <lobrst@xxxxxxxxx> wrote:

Sure, that’s why I think vocabularies need to be mapped to ontologies, i.e., terms (words, phrases) to concepts (referents). Humans use natural langauge.

However, pragmatically, when someone is mapping a vocabulary(ies) to an ontology, it’s useful to have the following:

Vocab1: Lorry --->

Onto1: Truck

Vocab2: Semi --->

Rather than:

Vocab1: Lorry --->

Onto1: Rock

Vocab2: Semi --->

Or:

Vocab1: Lorry --->

Onto1: Z

Vocab2: Semi --->

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Amanda Vizedom
Sent: Sunday, February 26, 2012 7:04 PM

To: [ontolog-forum]
Subject: Re: [ontolog-forum] What goes into a Lexicon?

Leo,

I absolutely agree that humans need meaningful labels. What I dislike is using the *names* for that purpose. Why? Because you only get one name, and it is fixed for all users and contexts. Labels, on the other hand, can be localized and specialized and lexicalized. And tools can use that information along with other user prefs to show the human the labels that are most likely to convey the intended meaning accurately to the user.

Opaque *names* don't mislead. Nor do they deprive the user of meaningful *labels*, when labels information is included in the bare minimum content. And if labels are used well, the user can even get the labels that are used in his or her own language, locality, and/or functional or domain specialty, instead of having to fight against frequent misimpressions generated by someone else's.

We can let the tools pick up more of that burden, and get more bang out of our ontologies that way, at least for some usages.

Best,
Amanda

On Feb 26, 2012 5:58 PM, "Obrst, Leo J." <lobrst@xxxxxxxxx> wrote:

The great secret is that concept labels don’t matter, for machine interpretation. They are what they are for humans to nod over. What matters is the logic. If you look at mereotopological axioms, for example, these use labels that are close to words humans would use, e.g., part and proper-part, connected, etc.

So natural language does matter, and how expressions in NL map to those concepts do matter. Which is why everyone should use meaningful names for those labels, even if those labels are only approximate. When I see “h(X) -> m(X)”, it helps to locally paraphrase it (if appropriate) as “human(X) -> mammal(X)” – for human understanding, if not for machine interpretability. It’s better for debugging.

Yes, concepts via their labels don’t wear their semantics on their sleeve, so to speak. But when vocabularies need to be mapped to ontologies by humans, it’s very useful to have ontology nodes (predicates, individuals, etc.) labelled in approximate natural language. Humans thereby make fewer errors.

Thanks,

Leo

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Rich Cooper
Sent: Sunday, February 26, 2012 3:54 PM
To: '[ontolog-forum] '
Subject: Re: [ontolog-forum] What goes into a Lexicon?

Dear Amanda and David,

Amanda wrote:

What probably causes the confusion is that the nodes in an ontology are not "terms" in this sense; they are NOT bits of language, which may have many meanings, but rather these nodes are *concepts*, abstracted from the various ways they might be expressed, in any language or jargon or context or by anyone. It *is* a rule of ontology that each abstract *concept* must have one formal definition/meaning; that's what makes it a *specific abstract concept*, and what makes it computable as part of an ontology. But there may be any number of ways of expressing this concept in language, symbols, etc., and any particilar bit of language may be associated with any number of different concepts. In an ontology, what it looks like for "terms", in the used-language sense, to have multiple meanings is that those terms are associated with multiple abstract concepts, where each of those concepts has a single, formal definition/meaning.

It is an abstraction, IMHO, to call a concept by ANY linguistic term. That is, the concept has such depth of meaning when you look at how it interlocks with other concepts in the lattice that the phrases people use to describe the concept are misleading and even wrong at times – most times it seems.

If concepts had some form of meaningless index, like a social security number, or other social construction that did not use English words, I could believe that the concept is different from the various terms used to describe it. But that is not the practice used on this forum to date. Concepts have always been described here by English terms, not by asemantic indexes.

Given an index value, it could be wikified to show various English terms describing the concept for reference purposes. Then programmers could click on an index, get a pop up page of full description, and even search the set of indexes using Google like phrases to find a list of concept indexes which might be relevant. If that were done, I could believe that the index designates a set of abstract, language free concepts.

But current practice is to refer to a concept with a word or phrase that captures only a tiny portion of the real semantics of that concept. Therefore David’s point of subjectivity blinding the philosophy of a concept set is very appropriate to the ways in which concepts are actually used in software development.

When concepts are named with words or phrases, they are at least as ambiguous as the words or phrases.

HTH,

-Rich

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Amanda Vizedom
Sent: Sunday, February 26, 2012 12:26 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] What goes into a Lexicon?

David,

Actually, I think that you *do* have it wrong. At least, when you say:

"since as far as I know it's a hard & fast ontological rule that requires a term to have a single definition/meaning... an extremely unrealistic constraint for this sort of ugly real world challenge. [If I've got this wrong, please set me straight.] "

Up to that point, your message seemed to be about lexicons, terminologies, and other bits of language. And of course, as you point out, these terminologies vary highly from one context to another, even as used by a singular person at different points in time.

And you are right that some people have tried, using various modeling and standardization methods, to fix one meaning/definition for a term, where "term" = bit of language, word, phrase, abbreviation, _expression_, the kind of thing you would find in a lexicon or controlled vocabulary, and then tried and failed to use the result to represent content across heterogenous sources. Or tried and failed to impose this, top-down, on all data sources, users, systems, code, etc. Even within an enterprise with the supposed ability to enforce such uniformity, it fails. That approach simply doesn't fit with the realities of how people work, how meaning is bound up with context, how terminology evolves with use and how that local evolution is part of the development of expertise and efficiency.

With respect to all of that, IMHO, you're absolutely right.

Where things go wrong is in the bit I quoted above. It's absolutely NOT a hard & fast rule of ontology that each term have one definition/meaning, if by "term" you still mean what you meant in those previous paragraphs: a bit of language, word, phrase, abbreviation, _expression_, the kind of thing you would find in a lexicon or controlled vocabulary. In an ontology, each one of those things (e.g., each word, abbreviation, phrase...) can be associated with many different meanings. You might (or might not) even capture some relationship between those term-to-meaning associations and some context factors such as source, business process, localization, etc., if this is important for your usage. But even then, there is no restriction to one meaning per (language) "term" per context.

I hope that makes the matter a bit clearer. Where ontology is successfully used for interoperability, in environments where multiple meanings per used-language "term" are typical and assumed, the ontology can help by capturing and providing mappings between the polysemous used-language "terms" (including data values, field names, and unstructured or semi-structured text) and whichever, and however many, single-meaning abstract concepts those used-language terms are used for. So the used-language "terms" get to keep their many meanings; it's the abstract and formally defined *concepts* that must have just one.

Again, I hope that clarifies thing a bit. It's made more confusing by the fact that the linguistic _expression_ "term" is used for multiple things. In some uses, "term" is used to mean a bit of used-language; in some uses, "term" is used to mean concept in an ontology. But despite that bit of typical, confusing polysemy, the fact is that ontologically, the bits of used-language can be associated with many meanings; it's the abstract concepts that have to have just one (though they can have many, and overlapping, used-language expressions).

Best,
Amanda

On Feb 25, 2012 9:41 PM, "David Eddy" <deddy@xxxxxxxxxxxxx> wrote:

Rich -

On Feb 25, 2012, at 6:27 PM, Rich Cooper wrote:

> Ontology
> designers that produce a well documented, highly
> learnable and usable ontology (i.e., something
> simple and down in the details of a domain) could
> provide a satisfying brick to many of those first
> time developments.

I am speaking in the context of the legacy software systems that
enable our lives.

The language/lexicon/terminology/slang/whatever already exists in the
applications. Unfortunately it's pretty much been put together with
a single ended one-time pad... & that guy(s) has left the building.

The problem is, unless you have the SME sitting at your side, or lots
& lots of time, the terminology is very difficult to grok. And when
you move to the next assignment, the terminology/lexicon is very
likely to be different, so you have to forget what you just spent 6
months learning.

I would likely argue that this language collection has not been
accumulated with the idea of an organized ontology in mind.

Imposing an organized ontology on this disorganized language
collection probably isn't going be of much help.

But something that quickly shows or records or suggests that in a
particular context "no" actually means "id" (e.g. soc_sec_no....
social security "number" is not a number, it's an index... a very
different beast)... now that would be useful & likely to be embraced
by the grunts—application owners, analysts, programmers—in the trenches.

How ontologies could add value, I don't have a clue, since as far as
I know it's a hard & fast ontological rule that requires a term to
have a single definition/meaning... an extremely unrealistic
constraint for this sort of ugly real world challenge. [If I've got
this wrong, please set me straight.]

___________________
David Eddy
deddy@xxxxxxxxxxxxx

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [ontolog-forum] What goes into a Lexicon?, (continued) Re: [ontolog-forum] What goes into a Lexicon?, Rich Cooper Re: [ontolog-forum] What goes into a Lexicon?, Amanda Vizedom Re: [ontolog-forum] What goes into a Lexicon?, Rich Cooper Re: [ontolog-forum] What goes into a Lexicon?, Amanda Vizedom Re: [ontolog-forum] What goes into a Lexicon?, David Price Re: [ontolog-forum] What goes into a Lexicon?, Amanda Vizedom Re: [ontolog-forum] What goes into a Lexicon?, Obrst, Leo J. Re: [ontolog-forum] What goes into a Lexicon?, Amanda Vizedom Re: [ontolog-forum] What goes into a Lexicon?, Obrst, Leo J. Re: [ontolog-forum] What goes into a Lexicon?, Amanda Vizedom Re: [ontolog-forum] What goes into a Lexicon?, Obrst, Leo J. <= Re: [ontolog-forum] What goes into a Lexicon?, Obrst, Leo J. Re: [ontolog-forum] What goes into a Lexicon?, David Eddy Re: [ontolog-forum] What goes into a Lexicon?, Obrst, Leo J. Re: [ontolog-forum] What goes into a Lexicon?, Rich Cooper Re: [ontolog-forum] What goes into a Lexicon?, Christopher Menzel Re: [ontolog-forum] What goes into a Lexicon?, Rich Cooper Re: [ontolog-forum] What goes into a Lexicon?, Matthew West Re: [ontolog-forum] What goes into a Lexicon?, Rich Cooper Re: [ontolog-forum] What goes into a Lexicon?, Matthew West Re: [ontolog-forum] What goes into a Lexicon?, Rich Cooper

Previous by Date:	Re: [ontolog-forum] What goes into a Lexicon?, Kingsley Idehen
Next by Date:	Re: [ontolog-forum] SINCERE APOLOGY [WAS: What goes into a Lexicon?], Paul Tyson
Previous by Thread:	Re: [ontolog-forum] What goes into a Lexicon?, Amanda Vizedom
Next by Thread:	Re: [ontolog-forum] What goes into a Lexicon?, Obrst, Leo J.
Indexes:	[Date] [Thread] [Top] [All Lists]