ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] OntoNotes and the Omega ontology

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Cecil O Lynch" <clynch@xxxxxxxxxxxxxx>
Date: Sat, 25 Sep 2010 17:14:40 -0400
Message-id: <025101cb5cf6$aad88d60$0089a820$@com>
Well, I was a undergrad chemist and I don't think there are anywhere near
millions and probably not even 5,000 unique concepts. That is part of why
Chemistry is pretty simple to learn once you have the base grammar because
every concept logically derives from that vocabulary as a pre-coordinated
group of concepts with logical parsing availability. Note that the 118
concepts will only get you through the inorganic side of chemistry and does
not help a lot on the organic nomenclature side or the biochemistry side of
the equation.    (01)

I have a colleague who has looked at more than 500,000 medical records from
a series of hospitals and has distilled that vocabulary to less than 5000
concepts as well in that usage scenario, though SNOMED has over 300,000
unique concepts (so it seems that general usage limits this to some extent).    (02)

So I do think you can get pretty good results with a relatively small set of
concepts for the 2 standard deviations on either side of the bell curve in
any particular domain, but it would be difficult to extend to the other 5 %
of the functioning of an organization, and even more difficult to work
across disciplines.    (03)

Cecil
-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of David Eddy
Sent: Saturday, September 25, 2010 4:30 PM
To: [ontolog-forum] 
Subject: [SPAM] Re: [ontolog-forum] OntoNotes and the Omega ontology    (04)

John -    (05)

On Sep 25, 2010, at 2:22 PM, John F. Sowa wrote:    (06)

> For a particular application, 500 words might be sufficient.  But the
> vocabulary of our natural languages must support all the applications
> that anybody has ever implemented or even imagined.
>
> Just the vocabulary of organic chemistry requires millions of words
> for all the known molecules.  But I'll admit that the chemists have
> algorithms for defining new words from a much smaller number of
> primitives.    (07)

As always I'm coming at this from the opposite end of the spectrum.    (08)

It is my thesis that the concepts needed for an organization to  
function is in the 1500 to 6000 range.  Concepts unfortunately will  
have MANY names, acronyms, abbreviations, etc. The 1500 is from a  
smallish logistics company (with an implemented Zachman BSP, I might  
add) where the "glossary" function in the central, mainframe data  
dictionary contains approx. 1500+ terms.  This term/concept list is  
not frozen for all time... terms fall out of use, new ones come  
along.  But the point of this controlled technical glossary is that  
analysts/programmers do not just make up new terms/abbreviations on a  
whim... which is what happens in most shops.    (09)


The 6,000 is allegedly the Pentagon.    (010)

I have no knowledge or experience with organic chemistry so I'll have  
to revert to business applications language... I assume not as  
complex as organic chemistry.    (011)

But then again, chemistry seems to survive pretty well with 118 basic  
building blocks ("primitives"?) in the periodic table.  Somehow from  
those 118 core "concepts" language explodes to millions of terms.    (012)

Most organizations focus on a relatively narrow niche of "reality"...  
HP & IBM do hardware, software & consulting... but stocks & bonds are  
not a core part of their business.  Fidelity Investments does stocks  
& bonds but not hardware & software.    (013)



> Furthermore, I don't believe that there is any "unnatural language"
> that humans are capable of learning and using effectively.    (014)


Agreed... but we're on a track where we continually jump from project  
to project.  The days of a dedicated individual staying with a system  
for decades are ancient history.    (015)

By "unnatural language" I mean thingys with labels like M0101, M0102,  
etc.  May make sense to you, since you've been working this system  
for 5 years, but I'm a newbie... and since you're the SME I don't  
have access to you.  Even if I did have access to you, I wouldn't  
know how to form a good question.    (016)

When the new analyst/programmer opens up the code, how will they KNOW  
what MSTR-MENSA-NO actually MEANS?  Eventually they will learn, but  
it will take TIME & mistakes.  And by the time they've learned the  
language of the system, they're likely to be rotated out, taking  
their knowledge of quirks & oddities with them.    (017)



Here's what I mean by unnatural language... and these data element  
names are VERY good.    (018)

ACRUD-COMM-MTD
ACRUD-COMM-YTD
OTH-LIAB-AMT
OTH-LIAB-AMT-US
ANT-ADJ-AMT
ANTIC-TOT-AMT
ANT-ADJ-FL
FACE-AMT
ACC-ANTIC-DT
ANTIC-DT-RT
ACC-ANT-ITEM-NO
ACCP-CAT    (019)


Distilling the 800 data elements in this system down to the  
fragments, produces this pattern:    (020)

FL  occurs 224 times
NO occurs 154
AMT occurs 142
CD occurs 110    (021)


> it seems that Zipf's law for NL terms also applies to most computer
> languages.    (022)


Excellent.    (023)

I found an article, but it appears to only have looked at the  
language's (dating myself) reserved words (MOVE, ADD, COMPUTE...).  I  
want to look at the sort of language that ends up in variable names.    (024)


___________________
David Eddy
deddy@xxxxxxxxxxxxx    (025)

781-455-0949    (026)


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (027)



_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (028)

<Prev in Thread] Current Thread [Next in Thread>