Well, I was a undergrad chemist and I don't think there are anywhere near
millions and probably not even 5,000 unique concepts. That is part of why
Chemistry is pretty simple to learn once you have the base grammar because
every concept logically derives from that vocabulary as a pre-coordinated
group of concepts with logical parsing availability. Note that the 118
concepts will only get you through the inorganic side of chemistry and does
not help a lot on the organic nomenclature side or the biochemistry side of
the equation. (01)
I have a colleague who has looked at more than 500,000 medical records from
a series of hospitals and has distilled that vocabulary to less than 5000
concepts as well in that usage scenario, though SNOMED has over 300,000
unique concepts (so it seems that general usage limits this to some extent). (02)
So I do think you can get pretty good results with a relatively small set of
concepts for the 2 standard deviations on either side of the bell curve in
any particular domain, but it would be difficult to extend to the other 5 %
of the functioning of an organization, and even more difficult to work
across disciplines. (03)
Cecil
-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of David Eddy
Sent: Saturday, September 25, 2010 4:30 PM
To: [ontolog-forum]
Subject: [SPAM] Re: [ontolog-forum] OntoNotes and the Omega ontology (04)
John - (05)
On Sep 25, 2010, at 2:22 PM, John F. Sowa wrote: (06)
> For a particular application, 500 words might be sufficient. But the
> vocabulary of our natural languages must support all the applications
> that anybody has ever implemented or even imagined.
>
> Just the vocabulary of organic chemistry requires millions of words
> for all the known molecules. But I'll admit that the chemists have
> algorithms for defining new words from a much smaller number of
> primitives. (07)
As always I'm coming at this from the opposite end of the spectrum. (08)
It is my thesis that the concepts needed for an organization to
function is in the 1500 to 6000 range. Concepts unfortunately will
have MANY names, acronyms, abbreviations, etc. The 1500 is from a
smallish logistics company (with an implemented Zachman BSP, I might
add) where the "glossary" function in the central, mainframe data
dictionary contains approx. 1500+ terms. This term/concept list is
not frozen for all time... terms fall out of use, new ones come
along. But the point of this controlled technical glossary is that
analysts/programmers do not just make up new terms/abbreviations on a
whim... which is what happens in most shops. (09)
The 6,000 is allegedly the Pentagon. (010)
I have no knowledge or experience with organic chemistry so I'll have
to revert to business applications language... I assume not as
complex as organic chemistry. (011)
But then again, chemistry seems to survive pretty well with 118 basic
building blocks ("primitives"?) in the periodic table. Somehow from
those 118 core "concepts" language explodes to millions of terms. (012)
Most organizations focus on a relatively narrow niche of "reality"...
HP & IBM do hardware, software & consulting... but stocks & bonds are
not a core part of their business. Fidelity Investments does stocks
& bonds but not hardware & software. (013)
> Furthermore, I don't believe that there is any "unnatural language"
> that humans are capable of learning and using effectively. (014)
Agreed... but we're on a track where we continually jump from project
to project. The days of a dedicated individual staying with a system
for decades are ancient history. (015)
By "unnatural language" I mean thingys with labels like M0101, M0102,
etc. May make sense to you, since you've been working this system
for 5 years, but I'm a newbie... and since you're the SME I don't
have access to you. Even if I did have access to you, I wouldn't
know how to form a good question. (016)
When the new analyst/programmer opens up the code, how will they KNOW
what MSTR-MENSA-NO actually MEANS? Eventually they will learn, but
it will take TIME & mistakes. And by the time they've learned the
language of the system, they're likely to be rotated out, taking
their knowledge of quirks & oddities with them. (017)
Here's what I mean by unnatural language... and these data element
names are VERY good. (018)
ACRUD-COMM-MTD
ACRUD-COMM-YTD
OTH-LIAB-AMT
OTH-LIAB-AMT-US
ANT-ADJ-AMT
ANTIC-TOT-AMT
ANT-ADJ-FL
FACE-AMT
ACC-ANTIC-DT
ANTIC-DT-RT
ACC-ANT-ITEM-NO
ACCP-CAT (019)
Distilling the 800 data elements in this system down to the
fragments, produces this pattern: (020)
FL occurs 224 times
NO occurs 154
AMT occurs 142
CD occurs 110 (021)
> it seems that Zipf's law for NL terms also applies to most computer
> languages. (022)
Excellent. (023)
I found an article, but it appears to only have looked at the
language's (dating myself) reserved words (MOVE, ADD, COMPUTE...). I
want to look at the sort of language that ends up in variable names. (024)
___________________
David Eddy
deddy@xxxxxxxxxxxxx (025)
781-455-0949 (026)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (027)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (028)
|