[Top] [All Lists]

[ontolog-forum] FW: Semantic Enterprise Architecture -Interoperability?

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Rich Cooper" <rich@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 8 Sep 2010 23:23:01 -0700
Message-id: <20100909062305.A0097138D0D@xxxxxxxxxxxxxxxxx>

Hi Doug,


After more thought, here is a better reply on my perception of that architecture, as shown below,





Rich Cooper


Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2


-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of doug foxvog
Sent: Wednesday, September 08, 2010 4:22 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Semantic Enterprise Architecture -Interoperability?


On Wed, September 8, 2010 17:57, Rich Cooper said:

> David,


> I think he is referring to something like unique synsets, which have a

> single meaning, but which can have multiple word instantiations, a la

> WordNet.


Except that WordNet synsets do not have unique meanings.  The multiple

words in a synset have similar meanings.  I am referring to terms in an

ontology, each of which has a unique meaning, and which may be expressed

in a natural language in multiple ways.


> That arrow runs from the single meaning (synset) toward the

> {words}, not the other way around.  Reverse that arrow and you have the

> single interpretation that can be actually emulated; at the other end, you

> have words that point to several synsets which may alternatively interpret

> them, so the direction of the arrow is the critical concept I think.


It seems to me, both that an individual word has multiple meanings and

that individual meanings can be expressed by multiple words or phrases.

The arrow direction would depend upon the relationship indicated between

the entities referenced by the head and tail of the arrow.


Agreed.  You have to get from word to synset when inputting NLP, and from synset to word to generate NLP, so for full NLP you need both kinds of representations, therefore two each, many-to-many relationships, between all synsets and all words in the general case.  That leads to a very, very sparse matrix (MxN) for typical English applications.  But most EAs won't have to GENERATE a lot of complicated NLP, and those can likely be canned in a simpler database than the DB the interpretERs use for disambiguation and context construction.  So it is the inputting and interpretATION of the NLP that I am addressing, not the generation.  


The problem of analyzing NLP is tough enough for us to go into with this discussion.  If we try to cover generation of NLP in depth as well, my head starts to hurt.  Maybe we can hold generation for some future day’s discussion. 


But if the synset doesn't have a unique interpretANT, then there must be multiple rows for that set of interpretANTs also, one per <interpretANT x interpretER>, and the two sparse columns could be folded into the list of all EA columns where that packaging is appropriate for processing purposes.  Use a different set of columns, even a different DB and processors, if the NLP processing load is too high for the application’s EA context.  


Below the discussion leaves ontologies (if it was really there) and moves

to a discussion of enterprise architecture databases.


> So the enterprise architecture database should have columns that are

> unique synsets (in effect) of enterprise meaning.  Each synset could have one row

> for every word that instantiates it, perhaps one row for every word that

> can be interpreted with that synset as interpretant.


Are you saying that each column has a different set of rows?


Yes, by definition in the DBMS world, a column participates in a table (i.e. a relation), and that table has a set of {row}s for which that column is the intersection.  So imagine there is a column vector projection from the table, with one row in the projected column vector for every defined intersection in that <column x table>.  


(Visualization helps - see figures 1 through 4 for the metadata DB in http://www.englishlogickernel.com/Patent-7-209-923-B1.PDF as an example way to build metadata repositories).  


Or are you suggesting a matrix of synsets with a row for each word in

the synset?  Since there would be a lot more synsets than words in a

synset, perchance it would be better to have the rows being synsets and

the columns being words in the synset.


Looking at WN, I think there are lots more words than synsets, and that is certainly true for verb synsets IMHO, but either way, the selection criteria depend on the application at hand.  Just because one has more than the other doesn't make it the best architectural choice; there are too many performance considerations beyond that.  


I'm suggesting that a set of rows, each of which invokes one column with each row storing an interpretANT identifier, has to be filtered down to a single row (i.e. one interpretANT to identify) if the interpretER is going to be able to execute it.  Otherwise, you might want to program it to randomly choose when it gets ambiguous queries for that column.  But random choice is likely to result in undesirable consequences at times, so it would have to be thoroughly understood, tested, and planned based on situation predicates. 


Think of an instruction set, perhaps the java byte codes.  Only those byte codes can actually be interpretED by a software interpretER in the browser if the goal of the EA design is to be interpretED in a java virtual machine.  So one way or another, the syntax, semantics, pragmatics and interpretERs have to break down an utterance into jvm codes in that context to meet that goal.  


> Which brings up the problem of representing multiple interpreters.  Would

> each synset have one set of interpretant rows for each interpreter?  It

> seems like the only conclusion unless you want everyone in the enterprise

> to use words the same way (unlikely to be successful).


It could be useful to define contexts in which given words have different

meanings.  Then the interpreter would choose their context (payroll,

sales, etc.) for their current task.  Separate rows for each interpreter

would not be called for.


For disambiguating context, I use figures 5 through 24 of the ibid above.  Context seems more intimately related to the interpretER than to the interpretANT.  IDEF0 activity charts are great initial context definition charts if you allow multiple decompositions (perhaps one per synset of the verb being interpreted) for each IDEF0 context diagram.  


They are also fairly easy to represent in a DB and use to instrument aggregate performance data for analysis, as shown in the later figures 12-24 which illustrate decisions about how to partition the DB until you reach an unambiguously interpretABLE single row.  


Even if restricted to database tables, if one used a column after a word

to encode the set of contexts in which it was used one wouldn't need to

repeat rows (or tables) for each context.


Actually, in that architecture embodiment, a context is more complicated than a single column for most applications I think of.  And, it depends on the specific application complexity.  I suppose if your application has few enough contexts, and the contexts are simple enough, and there are enough tables to switch among, you can use that simple architecture for NLP.  But most likely English applications will need hundreds of tables with thousands of columns, at least as I image the future unfolding in NLP. 


== doug f


Thanks for the thoughtful comments, your advice is always good,



> -Rich


> Sincerely,

> Rich Cooper

> EnglishLogicKernel.com

> Rich AT EnglishLogicKernel DOT com

> 9 4 9 \ 5 2 5 - 5 7 1 2


> -----Original Message-----

> From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx

> [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of David Eddy

> Sent: Wednesday, September 08, 2010 2:24 PM

> To: doug@xxxxxxxxxx; [ontolog-forum]

> Subject: Re: [ontolog-forum] Semantic Enterprise Architecture

> -Interoperability?


> Doug -


> On Sep 8, 2010, at 5:12 PM, doug foxvog wrote:


>> a Semantic Web needs ontologies of terms with fixed meanings


> Is this saying that a term (word, phrase, acronym, abbreviation,

> whatever) can only have a single meaning?


> What did I miss here?



> As I have observed before & will undoubtedly observe again...


> George Miller's "Ambiguous Words"   http://www.kurzweilai.net/

> ambiguous-words    offers an average of 10 meanings per (real) word.


> My dictionary of largely acronyms (but where's the line between

> acronym & real word... I don't have a clue) finds some 34 meanings

> per term/word.  Whittling that down to 1 meaning per term is going to

> be tough.


> ___________________

> David Eddy

> deddy@xxxxxxxxxxxxx






Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>
  • [ontolog-forum] FW: Semantic Enterprise Architecture -Interoperability?, Rich Cooper <=