Hi Doug,
After more thought, here is a better reply on my perception of that
architecture, as shown below,
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of doug foxvog
Sent: Wednesday, September 08, 2010 4:22 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Semantic Enterprise Architecture -Interoperability?
On Wed, September 8, 2010 17:57, Rich Cooper said:
> David,
>
> I think he is referring to something like unique
synsets, which have a
> single meaning, but which can have multiple word
instantiations, a la
> WordNet.
Except that WordNet synsets do not have unique
meanings. The multiple
words in a synset have similar meanings. I am
referring to terms in an
ontology, each of which has a unique meaning, and
which may be expressed
in a natural language in multiple ways.
> That arrow runs from the single meaning (synset)
toward the
> {words}, not the other way around. Reverse
that arrow and you have the
> single interpretation that can be actually
emulated; at the other end, you
> have words that point to several synsets which
may alternatively interpret
> them, so the direction of the arrow is the
critical concept I think.
It seems to me, both that an individual word has
multiple meanings and
that individual meanings can be expressed by multiple
words or phrases.
The arrow direction would depend upon the relationship
indicated between
the entities referenced by the head and tail of the
arrow.
Agreed. You have to get from word to
synset when inputting NLP, and from synset to word to generate NLP, so for full
NLP you need both kinds of representations, therefore two each, many-to-many
relationships, between all synsets
and all words in the general case.
That leads to a very, very sparse matrix (MxN) for typical English
applications. But most EAs
won't have to GENERATE
a lot of complicated NLP, and those can likely be canned in a simpler database
than the DB the interpretERs use for disambiguation and
context construction. So it is the inputting and interpretATION of the
NLP that I am addressing, not the generation.
The problem of analyzing NLP is tough enough for us to go into
with this discussion. If we
try to cover generation of NLP in depth as well, my head starts to hurt.
Maybe we can hold generation for some future day’s
discussion.
But if the synset doesn't have a unique
interpretANT, then there must be multiple rows for that set of interpretANTs also, one per <interpretANT x interpretER>, and the
two sparse columns could be folded
into the list of all EA columns where that packaging is appropriate for
processing purposes. Use a different set of columns, even a different DB
and processors, if the NLP processing load is too high for the application’s EA context.
Below the discussion leaves ontologies (if it was
really there) and moves
to a discussion of enterprise architecture databases.
> So the enterprise architecture database should
have columns that are
> unique synsets (in effect) of enterprise
meaning. Each synset could have one row
> for every word that instantiates it, perhaps one
row for every word that
> can be interpreted with that synset as
interpretant.
Are you saying that each column has a different set of
rows?
Yes, by definition in the DBMS world, a
column participates in a table (i.e. a relation), and that table has a set of
{row}s for which that column is the intersection. So imagine there is a
column vector projection from the table, with one row in the projected column
vector for every defined intersection in that <column x table>.
(Visualization helps - see figures 1 through 4 for the
metadata DB in http://www.englishlogickernel.com/Patent-7-209-923-B1.PDF
as an example way to build metadata repositories).
Or are you suggesting a matrix of synsets with a row
for each word in
the synset? Since there would be a lot more
synsets than words in a
synset, perchance it would be better to have the rows
being synsets and
the columns being words in the synset.
Looking at WN, I think there are lots more
words than synsets, and that is certainly true for verb synsets IMHO, but
either way, the selection criteria depend on the application at hand.
Just because one has more than the other doesn't make it the best architectural
choice; there are too many performance considerations beyond that.
I'm suggesting that a set of rows, each of
which invokes one column with each row storing an interpretANT identifier, has
to be filtered down to a single row (i.e. one interpretANT to identify) if the
interpretER is going to be able to execute it. Otherwise, you might want
to program it to randomly choose when it gets ambiguous queries for that
column. But random choice is likely to result in undesirable consequences
at times, so it would have to be thoroughly understood, tested, and planned
based on situation predicates.
Think of an instruction set, perhaps the
java byte codes. Only those byte codes can actually be interpretED by a
software interpretER in the browser if the goal of the EA design is to be
interpretED in a java virtual machine. So one way or another, the syntax,
semantics, pragmatics and interpretERs have to break down an utterance into jvm
codes in that context to meet that goal.
> Which brings up the problem of representing
multiple interpreters. Would
> each synset have one set of interpretant rows for
each interpreter? It
> seems like the only conclusion unless you want
everyone in the enterprise
> to use words the same way (unlikely to be
successful).
It could be useful to define contexts in which given
words have different
meanings. Then the interpreter would choose
their context (payroll,
sales, etc.) for their current task. Separate
rows for each interpreter
would not be called for.
For disambiguating context, I use figures
5 through 24 of the ibid above. Context seems more intimately related to
the interpretER than to the interpretANT. IDEF0 activity charts are great
initial context definition charts if you allow multiple decompositions (perhaps
one per synset of the verb being interpreted) for each IDEF0 context diagram.
They are also fairly easy to represent in
a DB and use to instrument aggregate performance data for analysis, as shown in
the later figures 12-24 which illustrate decisions about how to partition the
DB until you reach an unambiguously interpretABLE single row.
Even if restricted to database tables, if one used a
column after a word
to encode the set of contexts in which it was used one
wouldn't need to
repeat rows (or tables) for each context.
Actually, in that architecture embodiment,
a context is more complicated than a single column for most applications I
think of. And, it depends on the specific application complexity. I
suppose if your application has few enough contexts, and the contexts are
simple enough, and there are enough tables to switch among, you can use that
simple architecture for NLP. But most likely English applications will
need hundreds of tables with thousands of columns, at least as I image the
future unfolding in NLP.
== doug f
Thanks for the thoughtful comments, your
advice is always good,
-Rich
> -Rich
>
> Sincerely,
> Rich Cooper
> EnglishLogicKernel.com
> Rich AT EnglishLogicKernel DOT com
> 9 4 9 \ 5 2 5 - 5 7 1 2
>
> -----Original Message-----
> From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
> [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of David Eddy
> Sent: Wednesday, September 08, 2010 2:24 PM
> To: doug@xxxxxxxxxx; [ontolog-forum]
> Subject: Re: [ontolog-forum] Semantic Enterprise
Architecture
> -Interoperability?
>
> Doug -
>
> On Sep 8, 2010, at 5:12 PM, doug foxvog wrote:
>
>> a Semantic Web needs ontologies of terms with
fixed meanings
>
> Is this saying that a term (word, phrase,
acronym, abbreviation,
> whatever) can only have a single meaning?
>
> What did I miss here?
>
>
> As I have observed before & will undoubtedly
observe again...
>
> George Miller's "Ambiguous
Words" http://www.kurzweilai.net/
> ambiguous-words offers an
average of 10 meanings per (real) word.
>
> My dictionary of largely acronyms (but where's
the line between
> acronym & real word... I don't have a clue)
finds some 34 meanings
> per term/word. Whittling that down to 1
meaning per term is going to
> be tough.
>
> ___________________
> David Eddy
> deddy@xxxxxxxxxxxxx