Hi John, in
http://www.jfsowa.com/pubs/cg4cs.pdf
Conceptual Graphs for Representing Conceptual
Structures
You state that:
For
empirical subjects, however, conjunction and the existential quantifier are the
only operators that can be observed directly, and the others must be inferred
from indirect evidence. Therefore, Peirce’s choice of primitives combined
with a mechanism for defining other operators seems appropriate.
Let's choose a name for this algorithm
you've postulated. I will call it "Sowa's Empirical Representation
Algorithm", or SERA, in your honor.
Suppose E(I) is the set of experiences e[j]
in which object I is embedded over its lifetime. Elements e[j] can be
simple or complex experiences, perhaps you could call them
"situations" in FOL terminology, so they are structured by some
sensing and recognition stuff outside the scope of SERA. SERA will do I/O
on this structured form, and its atoms are defined as lexically distinct
signals in SERA that can be distinguished from other signals also in
SERA. You suggested CGIF, I was thinking JSON, and this might be a good
way to weigh the cost and values of various approaches.
CGIF might be the way in which e[] get
represented for string I/O, but however e[] is structured, the logic of the contents
of e[] is to be stored relationally, fanned out over tables and columns as
needed to interpret the contents of that singular experience. It will be
a complex set of paths, which I call an And-Or forest since everyone knows what
that is.
It is also necessary to order the e[] within
the And-Or forest based on a stable, repeatable associative storage and
retrieval method - a primary key - that can be used to recognize e[k] that are
"equivalent" to previously experienced e[j]. Like Selz before
us, we need to identify what parts of the representation are variable and which
are invariant.
If a sentence is one complete experience
e[], I would like to match that sentence against a database of
"similar" sentences and collect historic information about how things
went in those prior e[] before I take action in the current e[]. The
first phase of analysis is to organize those e[] into a traversable,
identifiable history database.
Using the LGP, I can get parse information
direct from the e[] and use that metalevel information to index each sentence,
phrase, designator, verb, and so forth in relational, tabular forms. Here
is a sample parse from the LGP:
the
quick brown fox jumped over the lazy dog
++++Time
0.04 seconds (0.04 total)
Found
2 linkages (2 had no P.P. violations)
Linkage 1, cost vector = (UNUSED=0 DIS=0 AND=0 LEN=18)
+---------Ds---------+
+-------Js------+
|
+-------A------+
| +-----Ds----+
|
| +---A--+---Ss--+--MVp-+
| +--A--+
|
| |
| |
| | | |
the
quick.a brown.a fox.n jumped.v over the lazy.a dog.n
By collecting all the arc annotation terms
(Ds,Js,A,Ss,MVp,...) and node annotation _expression_:
( jumped.v fox.n (Ds(the,A(quick.a,A(brown.a,fox.n)))),
dog.n
(Js(over,Ds(the(A(lazy.z,dog.n)))))
)
This lispish _expression_ is nicely
interpretable with efficient performance and representation choices,
IMHO. But every node and every arc and every terminal has to be stored in
the e[] database as information that can be found again when the next e[] comes
along to be compared in turn against its predecessors.
So the lispish form above is good for
interpreting the sentence to produce structured logical storage long enough to
interpret the statement. Other views of the same information are also
needed, and also available, so long as they are based on discriminants FOL predicates
using only the LGP metalevel and terminal representations.
Which brings me to the question: Is there
enough information in an LGP parser output to generate a conceptual graph view
of the sentence? One is syntax and the other is semantics, so they may
simply be orthogonal processes that can't be rationally compared. If not,
what else is needed, and how practical is it to translate an arbitrary English
sentence into CG notation automatically?
Thanks,
-Rich