John F. Sowa
Wed, 07 Mar 2007
Duane, Keith, Peter, and Leo,    (01)

When we consider collaborative development (by folk) rather
than legislated development (by a standards body), it makes
sense to look at the evolution of natural languages.    (02)

The difference between pidgins and creoles, for example,
is important.  The term "pidgin English" is a mispronounced
version of "business English", and the word "pidgin" has
become the usual term for a mixture used for commercial
purposes among the speakers of two or more natural
languages (such as English and Chinese with mixtures
of Portuguese and Dutch).    (03)

But a "creole" is a language that has native speakers
(often the descendants of mixed ethnic background in a
city where the pidgin is widely spoken).  Pidgins usually
have very little grammar, an impoverished vocabulary, and
idiosyncratic variations from one speaker to another.  But
creoles have a fairly complete grammar, more commonality
among speakers, and a vocabulary that is sufficiently rich
to cover all the needs of the community (with new words
being borrowed from the parent languages as needed).    (04)

So I would say that the collaborative tagging systems
start as pidgins, but they have the potential for evolving
into creoles.    (05)

KDW> ... examples of the tags that you've presented-- “stupid”,
 > “idiots”, “ROTFL” etc--my question is: What value do these
 > particular tags have?    (06)

DN> These stories do have a “haha” factor to them IMO but your
 > actual mileage might vary.    (07)

All natural languages have emotive and evaluative words, and
there's no reason why all tags must be neutral and "objective".
They could be useful in looking for reactions concerning
movies, politics, or programming languages and tools.    (08)

But I don't think that people are always going to search
by means of keywords, as they do with the current crop of
search engines.  Algorithms that are more sensitive to the
frequency of such terms and the words they are associated
with are likely to become much more important.    (09)

DN> In any event, the danger of a small group making declarations
 > on representation terms for certain resources is that they fail
 > to comprehend the way others might think of the resource.  In
 > these cases, I think that folksonomies provide a great way for
 > sampling a larger set of how society might label something.    (010)

I agree.  And I think that if RDF and OWL had evolved from a pidgin
instead of being legislated, they might have become considerably
more user friendly.  I believe that standards are important, but
only *after* a lot of alternatives have been tested, and the
best options have survived the trials by hordes of people who
pick and choose what they like.    (011)

DN> If the data collected from a folksonomy can be turned into a
 > thesaurus, that would probably make a much better knowledge
 > sharing mechanism.    (012)

I wouldn't say "be turned into a thesaurus".  That is a kind of
book invented by Roget in the mid 19th century and printed on paper.
We are more likely to see very different kinds of resources being
extracted and derived from the resources on the WWW.    (013)

PFB> - how does an artifice such as a formally modelled ontology
 > (cf Esperanto) become used in a manner that grows organically
 > as an interoperable means of discourse (cf post-1945 Hebrew)?    (014)

I wouldn't say that the development of Esperanto was really
"formal", but I would say that it was legislated.  And I don't
think that the choices were ideal.    (015)

My preference would have been for *Latina sine Flexione*, which
was proposed by Giuseppe Peano about a century ago.  Peano went
to a conference in Paris, where he began by speaking in classical
Latin, and nobody understood what he was saying.  But with each
sentence, he made a suggestion for how to simplify Latin grammar,
and the audience (all of whom had some familiarity with at least
one Romance language) began to understand him better and better.
By the end of the speech, he was speaking a language that was
very close to the common core of French, Italian, Spanish, and
Portuguese, but with the word forms taken directly from Latin.    (016)

Unfortunately, Esperanto had more political clout at that time,
and Peano's proposal was ignored.  But I think Peano's approach,
if adapted to ontologies and related technologies would be a good
way to go:  start with something that has proved its value, make
improvements in a step by step way, and test each improvement on
the intended audience before making any irrevocable decisions.    (017)

PFB> - in the other direction, how does an informal "language"
 > ("folksonomy", community tagging etc, cf orally transmitted
 > Hebrew/Yiddish before 1945) become codified in a manner that
 > provides stability, predictability and interoperability without
 > losing its organic character?    (018)

Again, I would use Latin as an example.  The ordinary Romans
certainly didn't speak the way Cicero did in his orations
in the Roman Senate.  The written record is based on highly
polished and edited constructions derived from the resources
of the common folk.  As the Romance languages evolved from Latin,
the word forms changed, and the grammar became simplified.    (019)

What Peano did was to go back to the common word stock in
its original form and simplify the grammar into the form
that had evolved, more or less in parallel, among the
various Romance languages.    (020)

LJO> In the general field of linguistics, there is
 > 1) prescriptive (English teacher's "you must ...")
 > 2) descriptive (the science of linguistics) view
 > 3) normative usage as gauged by lexicographers.    (021)

For ontology development, I would advocate normative
usage based on the descriptive studies -- which is
fairly close to what lexicographers do.  And that is
not too much different from what Peano did.    (022)

John    (023)

