[Top] [All Lists]

[ontolog-forum] Alternate Classification Schemes - an argument from a re

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Gary Berg-Cross <gbergcross@xxxxxxxxx>
Date: Fri, 23 Aug 2013 08:54:44 -0400
Message-id: <CAMhe4f3HREizCSnHK3bySSuzqO5+nzKuq3ve69MFRiTPfHOOFA@xxxxxxxxxxxxxx>

We’ve had numerous discussions about taxonomies on this forum.  A recent one was around “Taxonomies, cuts, and the decimal system.” As part of this William Frank made some comments on different “orthogonal taxonomies, or classification schemes, for different families of concepts.”  William went on to note that:

Combinations of classifiers that are part of orthogonal classification schemes need to be accommodated in a different manner (most effectively, in my experience, through composition, but most commonly through "multiple inheritance" (but me, I do not know what "inheritance" means, except in biology and in class-oriented programming languages -- ((though I used to know, before I thought about it much)).

 I was reminded of the issue of taxonomic understanding reading a review of the DSM-5: Diagnostic and Statistical Manual of Mental Disorders,(Fifth Edition by the American Psychiatric Association) written by Ian Hacking. It appears in the London Review of Books and was calledLost in the Forest.”


After discussing the history of the DSM and describing the efforts to revise it, Hacking makes his most critical comments on what he describes as its  mis-application of classification schemes from Biology to mental illness. He stated it this way (note - NOS, stands for Not Otherwise Specified):

“There have been many systems for classifying mental illness since then, but all seem to me to be on the botanical model, and that has been their fatal flaw. Many other kinds of illness are very like plants, and can be uniquely characterised, as Kraepelin tried to do, by a distinctive pattern of symptoms when a cause is not yet known. We don’t use NOS in the rest of medicine, and we do not have much systematic comorbidity. Perhaps in the end the DSM will be regarded as a reductio ad absurdum of the botanical project in the field of insanity. I do not say this because I believe that most psychiatry will, some day, be reduced to neuroscience, biochemistry and genetics. I take no stance on that here. The NIMH said it would stop using DSM because it lacked ‘validity’. In fact the DSM-5 has made a great effort to make sure it meets the criteria for what it sees as validity. That is not my problem. I am making a claim grounded more on logic than on medicine. Sauvages’s dream of classifying mental illness on the model of botany was just as misguided as the plan to classify the chemical elements on the model of botany. There is an amazingly deep organisation of the elements – the periodic table – but it is quite unlike the organisation of plants, which arises ultimately from descent. Linnaean tables of elements (there were plenty) did not represent nature.”

I think this idea of alternate, deep organization schemes within different domains is an important thing to consider when we try to build ontologies of them. Of course this makes backbone organization of domains a bit more challenging unless we have some grounds for the deep organizational principles in a domain. With mental illness we do not have an agreed upon model.

Gary Berg-Cross, Ph.D.  
SOCoP Executive Secretary
Knowledge Strategies    
Potomac, MD

On Thu, Aug 22, 2013 at 2:26 PM, John F Sowa <sowa@xxxxxxxxxxx> wrote:
Bitext is an abbreviation for "bilingual text".  That is the basis
for the now widely used statistical methods for machine translation.
Two influential publications from 1993 by researchers at IBM


and Bell Labs


Following is an announcement for a conference on the topic:

Source: https://sites.google.com/site/20yearsofbitext/
> 1993 was a watershed year in the development of empirical methods
> for processing parallel corpora. Seminal publications by Gale and Church
> at Bell Labs (CL, 1993) and Brown and colleagues at IBM (CL, 1993)
> established the methodology, models, and algorithms that form the basis
> of the modern statistical approaches to machine translation and multilingual
> text processing. In that year the first Workshop on Very Large Corpora
> (which would ultimately become EMNLP) was also held, a sign of the broader
> sea change that transformed how problems in natural language processing
> are approached.

The original idea of using bilingual texts was suggested by John Cocke
at IBM in the 1970s.  John was one of the pioneers in computer hardware
and software.  Among his many achievements is the Cocke-Kasami-Younger
algorithm for parsing.  See http://en.wikipedia.org/wiki/CYK_algorithm .
In 1970, he designed a RISC machine that evolved into IBM's Power PC
and the Power chips that are used in IBM's supercomputers.

Following are three early papers about the statistical methods:

Bahl, Lalit R., John Cocke, Frederick Jelinek, Josef Raviv (1974).
"Optimal decoding of linear codes for minimizing symbol error rate".
IEEE Transactions on Information Theory 20(2):284–287.

Bahl, Lalit R., John Cocke, Frederick Jelinek, Josef Raviv (1976).
"Continuous speech recognition by statistical methods". Proceedings
of the IEEE 64(4):532–556.

Brown, P., J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek,
R. Mercer and P. Roossin (1988). "A statistical approach to language
translation". In Dénes Vargha, ed. Coling 88: Proceedings of the 12th
conference on Computational linguistics, volume 1. Budapest: John Von
Neumann society for computing sciences. pp. 71–76.

Note that John C. pioneered symbolic methods *and* suggested the basis
for the statistical methods.  Also note that John C. never published
anything by himself.  He was brilliant, but he was very disorganized
as a speaker or author.  He would begin in the middle of a subject
and wander around in a loosely linked train of thought.

Fred Jelinek was the manager of the speech recognition project at IBM.
His goal was to recognize continuous speech as it is normally spoken.
Two of the researchers in his group were Jim and Janet Baker, who
advocated a simpler method that required the speakers to make short
pauses between words.  Janet B. wrote an IBM technical report whose
title summarizes the problems:

    "How to recognize speech, not wreck a nice beach."

Fred J. insisted on continuous speech recognition as the research
direction, and he did not approve a separate project on discrete word
recognition.  So Jim and Janet left IBM to found their own company,
Dragon Systems.

Their original product required slight pauses between words, but they
reinvested the profits to continue improving the methods.  As a result,
Dragon Systems developed continuous speech recognition systems that
were better than IBM's.

The R & D developments and the business arrangements are complex with
collaboration, competition, buyouts, and cross-pollination at many
different levels.  But there are many points to consider:

  1. Statistical methods and symbolic methods are complementary.
     Many of the same people developed, contributed to, and adopted
     various combinations of both.

  2. Large corporations, such as IBM, AT&T, and Xerox, pioneered much
     of the research, but lost their advantage to smaller companies:
     IBM designed SQL, but Oracle became the largest RDBMS company.
     AT&T invented transistors, Fairchild became the largest supplier,
     but a spinoff group from Fairchild founded Intel.  Xerox PARC
     designed the WIMPy interface, but Apple reaped the benefits.

  3. Technology transfer from research to development is notoriously

     a) Researchers and product developers have different goals:
        Fred J. wanted to do pure research, but Jim & Janet wanted
        to build a practical product.

     b) Large corporations have internal battles:  In the 1970s,
        IBM's "cash cow" for selling computers and disk drives was
        the IMS database system.  Internal politics prevented IBM
        from bidding on a contract to deliver a relational DBMS
        to the CIA.  A small company called Oracle won the bid.

     c) The researchers and developers don't understand each other.
        Development managers who are familiar with the previous
        technology are reluctant to adopt unfamiliar methods.

     d) Neither the R nor the D people understand what customers need.
        Steve Jobs' greatest strengths were his ability to think like
        a user and his power to force the developers to go back to
        the drawing board until they got something he liked.

  4. A small company with a single product line can be more focused.
     In Dragon systems, the top executives were intimately familiar
     with the technology, and they got immediate feedback from their
     customers.  They used the profits to develop new products that
     their customers wanted.  But note that the failure rate of small
     companies is very high:  an ideal combination of technology,
     management, developers, and products is rare.

  5. The rapid pace of product development, release, and obsolescence
     obscures the fact that fundamental research progresses at a much
     slower pace:

     a) The Bitext conference is celebrating 20 years since the
        publications of 1993, but the early papers were published
        almost 40 years ago.

     b) The latest and greatest chips from Intel and IBM implement
        instruction sets based on designs from the 1970s.

     c) New developments in one area can shift the tradeoffs among
        the many options in other areas.  The statistical methods
        that required large mainframe computers in the 1970s could
        run on high-speed workstations in the 1990s and on laptop
        computers today.

But no single method is ideal for all purposes.  My major
qualification to the announcement above is about the "sea change"
that "transformed how problems in NLP are approached."

For humans, language understanding and generation is intimately
connected with every aspect of perception, action, and reasoning
about every aspect of human experience.  I agree with Marvin Minsky
(and many others) that no single paradigm or algorithm is sufficient
for analyzing or generating language about all of them.

In short, the world has many seas and even more watersheds.
The waters from all them are in constant flux.


Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>