[ontolog-forum] XML tags as natural language words

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
Cc: yorick@xxxxxxxxxxxxxx
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Sat, 13 Dec 2008 14:33:53 -0500
Message-id: <49440E21.40409@xxxxxxxxxxx>
One of the goals of formal ontologies has been the precise
definition and axiomatization of the tags used in the Semantic
Web and other media.  But the most widely used tags in RDF
and other XML-based notations are only defined by statements
in ordinary natural languages.    (01)

That observation implies that the XML tags are no more
formal or reliable than ordinary terms in any of the large
numbers of terminologies used in various fields of science,
engineering, medicine, business, and law.    (02)

Question:  How can we rely on any deductions that use the
formally defined axioms of some ontology when the input tags
have no formal definition?    (03)

Second question:  Even for those tags that are formally
defined, how can we be sure that the people who selected
the tags, either by annotating the raw data or by clicking
on a menu, had read or understood the formal definitions?    (04)

Those are issues that many people have raised.  Recently,
Yorick Wilks, who has been working in NL semantics for over
forty years, has written some papers to address those topics:    (05)

  1. "The Semantic Web as the apotheosis of annotation, but what
     are its semantics?"
     http://www.dcs.shef.ac.uk/~yorick/papers/IEEE.SW.untrak.pdf    (06)

  2. "On whose shoulders?"
     (The most relevant section is on pp. 9-12.)    (07)

Following is a quotation from p. 11 of ref #2:    (08)

The SW accords a key role to ontologies as knowledge structures:
partially hierarchical structures containing key terms -- primitives
again under another guise -- whose meanings must be made clear,
particularly at the more abstract levels.  The old AI tradition in
logic-based knowledge structuring -- descending from McCarthy and
Hayes (1969) -- was simply to declare what these primitive predicates
meant.  The problem was that predicates, normally English words
written in capital letters (as all linguistic primitives in the end
seem to be), became affected by their inferential roles over time
and the process of coding itself.  This became very clear in the
long-term CyC project (Lenat 1995) where the key predicates changed
their meanings over 30 years of coding, but there was no way of
describing that fact within the system, so as to guarantee consistency.
In Nirenburg and Wilks (2000), Nirenburg and I debate this issue in
depth, and I defend the position that one cannot simply maintain
the meanings of such terms by fiat and independent of their usage
-- they look like words and they function like words because, in
the end, they are words.    (09)

Ref #2 was published in the December 2008 issue of the _Computational
Linguistics_ journal.  In that same issue, there was another paper,
"Inter-Coder Agreement for Computational Linguistics," on the
question of reliability of human annotations.  For a copy, see    (010)

    http://cswww.essex.ac.uk/Research/nle/arrau/icagr-short.pdf    (011)

Section 4.5 of that paper discusses the task of distinguishing word
senses for words that have more than one sense (i.e., nearly all).
For naive coders (i.e., people who are not professional lexicographers)
typical agreement between coders varied from 67% to 78%.  In a study
that used professional lexicographers and arbitration to resolve
disagreements, they achieved 95.5% agreement.    (012)

Note that even professionals under carefully controlled conditions
followed by arbitration did not achieve 100% agreement.  In formal
deduction, even the slightest error can cause a theorem prover to
collapse in contradiction.  If something between 4.5% and 33% of the
data is incorrect, all the claims about the need for formal precision
in the axioms and proof procedures become questionable.    (013)

These issues do not imply that formal logic and ontology are useless,
but they do imply that we have to revisit the assumptions about
using formal logic on the XML tags of the WWW or SW.    (014)

John Sowa    (015)

