[ontolog-forum] Natural languages and formal languages

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Sun, 21 Nov 2010 12:20:00 -0500
Message-id: <4CE954C0.3090801@xxxxxxxxxxx>
Some recent notes on Corpora List raised the old slogan by linguists:
"All grammars leak."  Yorick Wilks pointed out that one of his students
tried to induce phrase-structure grammar rules from the Penn Tree Bank,
but "the number of rules was enormous and, most significantly, I
thought, still rising linearly at the end of the PTB corpus".    (01)

At the end of this note is my response, in which I quoted the linguist
Edward Sapir, who had spent many years in writing grammars for Native
American languages.  He seems to be the source of the slogan about
leaky grammars, and his observations are important.    (02)

I believe we can generalize that principle even further:    (03)

    All formal ontologies leak.    (04)

In other words, there is no such thing as a fixed ontology that
can precisely characterize any domain of knowledge that is growing,
developing, and being used in practical applications.  Even for a
single application, the ontology changes with every version, update,
and patch to the software.    (05)

That observation does not imply that we should stop writing ontologies,
but it does imply that we need to support dynamic methods for ontology
repair, revision, extension, and growth.  In earlier notes, I cited
some papers by Alan Bundy, who reached similar conclusions from his
work in developing ontologies for theorem proving and problem solving
in physics.  Following is a list of his publications:    (06)

    http://www.inf.ed.ac.uk/publications/author/bundy.html    (07)

Following is a brief article that I wrote about dynamic ontology:    (08)

   http://www.jfsowa.com/pubs/dynonto.htm    (09)

There is much more to say about this topic, but I'd just like to
mention how we have been dealing with these issues in our work at
VivoMind.  The basic principle is that we have to treat NL grammars
and ontologies as very leaky approximations:  There is no limit on
the amount or kind of knowledge and grammatical patterns that may
be needed to understand ordinary language.    (010)

We do, of course, support language about formal systems, such as
computer software and databases.  But a good example of how we do
that is to use a knowledge base that is derived from the formal
systems themselves.  For an example, see the work on legacy
re-engineering in slides 24 to 30 of the following talk:    (011)

    http://www.jfsowa.com/talks/pursue.pdf    (012)

Note that each COBOL program is internally consistent, but the
totality of all the programs is only consistent at the interfaces,
not in the internal assumptions about the data.  The documents that
describe the programs are of varying quality and accuracy, and they
may mix descriptions of different versions of the software in the
same document.    (013)

We have to expect that kind of mixture in NLP -- and as Bundy
emphasizes, in formal reasoning as well.  We might be able to enforce
a "tyrannical" or controlled grammar and ontology for a special-purpose
application over which we have total control -- but we can't expect
them to last beyond the next update to the application.    (014)

Bottom line:  Learning is essential for any kind of intelligent
system -- for the simple reason that no fixed KB can be adequate.    (015)

John Sowa    (016)

-------- Original Message --------
Subject: Re: [Corpora-List] RE : Annotation layers: missing reference
Date: Sun, 21 Nov 2010 09:26:34 -0500
From: John F. Sowa <sowa@xxxxxxxxxxx>
To: corpora@xxxxxx    (017)

On 11/21/2010 7:55 AM, Yorick Wilks wrote:
> I dont think the dating of "corpora and grammar" so early is right.    (018)

I doubt that you can find a clear "birthday" for any significant idea,
theory, methodology, or movement.    (019)

> Alex Krotov found that if you induced the PS grammar rules from the PTB,
> in a pretty straightforward way from the trees, then the number of rules
> was enormous and, most significantly, I thought, still rising linearly
> at the end of the PTB corpus, which didnt prove anything but made one
> wonder about all the claims of finite grammar and infinite language
> that we had all been indoctrinated with.    (020)

That is a Chomskyan claim that never had any empirical justification.
As a linguist who wrote grammars for actual languages, Sapir (1921)
had a much deeper understanding of the nature of grammar:    (021)

> The fact of grammar, a universal trait of language, is simply
> a generalized expression of the feeling that analogous concepts and
> relations are most conveniently symbolized in analogous forms. Were
> a language ever completely "grammatical," it would be a perfect engine
> of conceptual expression. Unfortunately, or luckily, no language is
> tyrannically consistent. All grammars leak.    (022)

I believe that Sapir's second adverb "luckily" is more appropriate.
For examples of tyrannical languages, see Orwell's Newspeak or the
efforts by Frege, Russell, Carnap, and the Vienna Circlers.    (023)

John Sowa    (024)

Note:  I found the quotation from Sapir in Linguist List:
http://linguistlist.org/issues/4/4-85.html    (025)

