[Top] [All Lists]

[ontolog-forum] Twenty years of bitext

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Thu, 22 Aug 2013 14:26:13 -0400
Message-id: <521657C5.2050005@xxxxxxxxxxx>
Bitext is an abbreviation for "bilingual text".  That is the basis
for the now widely used statistical methods for machine translation.
Two influential publications from 1993 by researchers at IBM    (01)

    http://acl.ldc.upenn.edu/J/J93/J93-2003.pdf    (02)

and Bell Labs    (03)

    http://acl.ldc.upenn.edu/J/J93/J93-1004.pdf    (04)

Following is an announcement for a conference on the topic:    (05)

Source: https://sites.google.com/site/20yearsofbitext/
> 1993 was a watershed year in the development of empirical methods
> for processing parallel corpora. Seminal publications by Gale and Church
> at Bell Labs (CL, 1993) and Brown and colleagues at IBM (CL, 1993)
> established the methodology, models, and algorithms that form the basis
> of the modern statistical approaches to machine translation and multilingual
> text processing. In that year the first Workshop on Very Large Corpora
> (which would ultimately become EMNLP) was also held, a sign of the broader
> sea change that transformed how problems in natural language processing
> are approached.    (06)

The original idea of using bilingual texts was suggested by John Cocke
at IBM in the 1970s.  John was one of the pioneers in computer hardware
and software.  Among his many achievements is the Cocke-Kasami-Younger
algorithm for parsing.  See http://en.wikipedia.org/wiki/CYK_algorithm .
In 1970, he designed a RISC machine that evolved into IBM's Power PC
and the Power chips that are used in IBM's supercomputers.    (07)

Following are three early papers about the statistical methods:    (08)

Bahl, Lalit R., John Cocke, Frederick Jelinek, Josef Raviv (1974).
"Optimal decoding of linear codes for minimizing symbol error rate".
IEEE Transactions on Information Theory 20(2):284–287.    (09)

Bahl, Lalit R., John Cocke, Frederick Jelinek, Josef Raviv (1976).
"Continuous speech recognition by statistical methods". Proceedings
of the IEEE 64(4):532–556.    (010)

Brown, P., J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek,
R. Mercer and P. Roossin (1988). "A statistical approach to language
translation". In Dénes Vargha, ed. Coling 88: Proceedings of the 12th
conference on Computational linguistics, volume 1. Budapest: John Von
Neumann society for computing sciences. pp. 71–76.    (011)

Note that John C. pioneered symbolic methods *and* suggested the basis
for the statistical methods.  Also note that John C. never published
anything by himself.  He was brilliant, but he was very disorganized
as a speaker or author.  He would begin in the middle of a subject
and wander around in a loosely linked train of thought.    (012)

Fred Jelinek was the manager of the speech recognition project at IBM.
His goal was to recognize continuous speech as it is normally spoken.
Two of the researchers in his group were Jim and Janet Baker, who
advocated a simpler method that required the speakers to make short
pauses between words.  Janet B. wrote an IBM technical report whose
title summarizes the problems:    (013)

    "How to recognize speech, not wreck a nice beach."    (014)

Fred J. insisted on continuous speech recognition as the research
direction, and he did not approve a separate project on discrete word
recognition.  So Jim and Janet left IBM to found their own company,
Dragon Systems.    (015)

Their original product required slight pauses between words, but they
reinvested the profits to continue improving the methods.  As a result,
Dragon Systems developed continuous speech recognition systems that
were better than IBM's.    (016)

The R & D developments and the business arrangements are complex with
collaboration, competition, buyouts, and cross-pollination at many
different levels.  But there are many points to consider:    (017)

  1. Statistical methods and symbolic methods are complementary.
     Many of the same people developed, contributed to, and adopted
     various combinations of both.    (018)

  2. Large corporations, such as IBM, AT&T, and Xerox, pioneered much
     of the research, but lost their advantage to smaller companies:
     IBM designed SQL, but Oracle became the largest RDBMS company.
     AT&T invented transistors, Fairchild became the largest supplier,
     but a spinoff group from Fairchild founded Intel.  Xerox PARC
     designed the WIMPy interface, but Apple reaped the benefits.    (019)

  3. Technology transfer from research to development is notoriously
     difficult:    (020)

     a) Researchers and product developers have different goals:
        Fred J. wanted to do pure research, but Jim & Janet wanted
        to build a practical product.    (021)

     b) Large corporations have internal battles:  In the 1970s,
        IBM's "cash cow" for selling computers and disk drives was
        the IMS database system.  Internal politics prevented IBM
        from bidding on a contract to deliver a relational DBMS
        to the CIA.  A small company called Oracle won the bid.    (022)

     c) The researchers and developers don't understand each other.
        Development managers who are familiar with the previous
        technology are reluctant to adopt unfamiliar methods.    (023)

     d) Neither the R nor the D people understand what customers need.
        Steve Jobs' greatest strengths were his ability to think like
        a user and his power to force the developers to go back to
        the drawing board until they got something he liked.    (024)

  4. A small company with a single product line can be more focused.
     In Dragon systems, the top executives were intimately familiar
     with the technology, and they got immediate feedback from their
     customers.  They used the profits to develop new products that
     their customers wanted.  But note that the failure rate of small
     companies is very high:  an ideal combination of technology,
     management, developers, and products is rare.    (025)

  5. The rapid pace of product development, release, and obsolescence
     obscures the fact that fundamental research progresses at a much
     slower pace:    (026)

     a) The Bitext conference is celebrating 20 years since the
        publications of 1993, but the early papers were published
        almost 40 years ago.    (027)

     b) The latest and greatest chips from Intel and IBM implement
        instruction sets based on designs from the 1970s.    (028)

     c) New developments in one area can shift the tradeoffs among
        the many options in other areas.  The statistical methods
        that required large mainframe computers in the 1970s could
        run on high-speed workstations in the 1990s and on laptop
        computers today.    (029)

But no single method is ideal for all purposes.  My major
qualification to the announcement above is about the "sea change"
that "transformed how problems in NLP are approached."    (030)

For humans, language understanding and generation is intimately
connected with every aspect of perception, action, and reasoning
about every aspect of human experience.  I agree with Marvin Minsky
(and many others) that no single paradigm or algorithm is sufficient
for analyzing or generating language about all of them.    (031)

In short, the world has many seas and even more watersheds.
The waters from all them are in constant flux.    (032)

John    (033)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (034)

<Prev in Thread] Current Thread [Next in Thread>
  • [ontolog-forum] Twenty years of bitext, John F Sowa <=