David, (01)
I would add many, many qualifications to that comment. (02)
JFS>> Omega is a 120,000-node terminological ontology (03)
DE> It is my observation (with a sliver of supporting facts) that UNL
> (unnatural language) is at the opposite end of this scale, probably
> under 500 words for a large, mature software application. My guess
> is that a large immature application would likely have more words --
> and redundancy, multiple labels/terms for the single concept "social
> security number" -- since the builders have been sloppy from moving
> too fast & lack of solid architectural specifications. (04)
For a particular application, 500 words might be sufficient. But the
vocabulary of our natural languages must support all the applications
that anybody has ever implemented or even imagined. (05)
Just the vocabulary of organic chemistry requires millions of words
for all the known molecules. But I'll admit that the chemists have
algorithms for defining new words from a much smaller number of
primitives. (06)
But the vocabulary of biology requires millions of terms for the known
species, and nobody today knows how to define *any* species uniquely.
For the small number of species for which they have determined the
genome, they are very far from being able to specify necessary and
sufficient conditions for determining what DNA changes distinguish
different species. (07)
Furthermore, I don't believe that there is any "unnatural language"
that humans are capable of learning and using effectively. Even the
languages of mathematics and logic evolved from abbreviations of NL
words; e.g., "2+2=4" as an abbreviation of "Two and two is four." (08)
> Did your/Arun's work with the reverse engineering project show that
> the actual words used in an application are surprisingly small? (09)
Arun used my KR ontology as a basic upper-level ontology, and he
added some basic types for for defining processes and data structures.
But he related that ontology to all the COBOL terms so that he could
map COBOL to conceptual graphs. Then he added new vocabulary items
for every file and data structure defined in the COBOL programs
and JCL scripts. So the actual vocabulary was quite large. (010)
> My assumption is that normal NL statistical practices will not work
> well when applied to such a small body of terms. Is this accurate? (011)
I don't know all the data, but from various articles I've run across,
it seems that Zipf's law for NL terms also applies to most computer
languages. You might search for "Zipf's law" with other qualifiers,
such as "English", "programming languages", etc. (012)
If anything violates Zipf's law, I would expect it to be RDF because
of the huge amount of redundancy caused by the XML conventions. But
that is the exception that proves the rule: Raw RDF is so unnatural
that people prefer to use notations that are much more "natural". (013)
In any case, see the abstract and reference below for a preprint
of an article that appears in the following book: (014)
http://www.amazon.com/Theory-Applications-Ontology-Philosophical-Perspectives/dp/904818844X/ref=sr_1_3?s=books&ie=UTF8&qid=1285437706&sr=1-3 (015)
John
______________________________________________________________________ (016)
Source: http://www.jfsowa.com/pubs/rolelog.pdf (017)
The Role of Logic and Ontology in Language and Reasoning (018)
John F. Sowa (019)
Abstract. Natural languages have words for all the operators of
first-order logic, modal logic, and many logics that have yet to be
invented. They also have words and phrases for everything that anyone
has ever discovered, assumed, or imagined. Aristotle invented formal
logic as a tool (organon) for analyzing and reasoning about the
ontologies implicit in language. Yet some linguists and logicians took
a major leap beyond Aristotle: they claimed that there exists a special
kind of logic at the foundation of all NLs, and the discovery of that
logic would be the key to harnessing their power and implementing them
in computer systems. Projects in artificial intelligence developed
large systems based on complex versions of logic, yet those systems are
fragile and limited in comparison to the robust and immensely expressive
natural languages. Formal logics are too inflexible to be the
foundation for language; instead, logic and ontology are abstractions
from language. This reversal turns many theories about language upside
down, and it has profound implications for the design of automated
systems for reasoning and language understanding. This article analyzes
these issues in terms of Peirce's semiotics and Wittgenstein's language
games. The resulting analysis leads to a more dynamic, flexible, and
extensible basis for ontology and its use in formal and informal reasoning. (020)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (021)
|