[Top] [All Lists]

Re: [ontolog-forum] Run, put, and set

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Rich Cooper" <rich@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 11 Jun 2011 15:06:21 -0700
Message-id: <20110611220625.1A3AD138CC4@xxxxxxxxxxxxxxxxx>

Hi David,


Thanks for your thoughts on the subject.  You and I have discussed the viability of small vocabularies for application specific linguistic competence before, though not in this depth.  


In my view, the small number of key words in a specific context (situation) satisfies that requirement without requiring that a plethora of interpretations has to be understood by the software that processes patents.  This simple fact has been demonstrated by the success of key word search when there is NO meaning represented, interpreted or even stored regarding the key word vocabularies.  


For example, in analyzing patents, I use the following two-three dozen frequent words to partition the claim sentence into claim elements that can be further syntactically analyzed:




























and a few others (two-three dozen) that I don’t remember off hand.  But that small number of highly frequent words are used in that context are adequate.  I don’t have to understand the other words in the claim sentence to be able to organize it into smaller clumps that can be more easily analyzed.  


That small vocabulary of two-three dozen or so words works for ALL known patent claims on the USPTO web site.  That site must have a high hundred thousand patents, perhaps up to a million patents, that have each been reviewed multiple times, organized into the PTO database format, and presented to the viewer with a simple web interface.  There is no need to use other words in partitioning claims into elements other than the two-three dozen or so I use as a personal standard.  


After that, additional rare words have much more specialized uses.  It may require a hundred thousand words of vocabulary, or even more to represent the vocabulary of ALL patents, but the slice of two dozen or so is adequate for the claim sentence partitioning task.  The total number of words used in patent claims is huge.  That is because there are typically 10 to 50 claims, averaging about 20 claims as typical.  


Consider this claim 13 from the 7,209,923:


I claim: ...

13.  A method for extracting contextual information about a plurality of objects and a plurality of activities from a plurality of texts comprising: annotating each said text with metadata information;  transforming each said text into one or more database rows that reflect the said annotations through tables and columns of the said database;  selecting a number P partitioning columns, said partitioning columns to be used for partitioning said database rows into partitions;  each said partition comprising zero or more rows, wherein if one or more rows exist, each said row having the same values, or value ranges, in each row and partitioning column in said partition selecting a number R processing columns of said database to be used in extracting information from said database;  said processing columns comprising structured columns and unstructured columns;  said unstructured columns including one or more columns containing words or phrases expressed in a language;  modeling classes and relationships among said plurality of objects and said plurality of activities described by entries in said database;  searching the said partitions for said contextual information based on the modeling;  and presenting said contextual information to a user;  wherein each of P and R is greater than zero.


Context specific words are much more numerous.  The following 34 words are “rare words” which describe the technology being claimed for this patent claim 13:





































Organizing that claim into a structured form based on the twenty or so frequent claim words, it looks more like this:



Note that the “extracting contextual information” step means only that, not something that DEFINES each word in specific terms.  That role is up to the patent specification, which explains the meaning of each word by using each one in sentences in the specification.  


Therefore the twenty word vocabulary is adequate for constructing context even without defining the semantic meaning of each word, or even of the twenty!  All those words are understood by the people who write patents, people who review and rule on patent interpretations, and people who invest in the patents.  But the context construed in this way is construed by the VIEWER, not by the vocabulary itself.  That small vocabulary is a way of constraining claim sentences to meet the requirements of explaining a method or system in form suitable for readers.  


Then, further partitioning the claim elements into specific references to the patent specification, the viewer shows how the words are actually used in context.  Here is a sample claim chart for one of those claim elements:


in each row and partitioning column in said partition

selecting a number R processing columns of said database

to be used in extracting information from said database;


The following referenced sentences in the specification include, among others:


[D9,58] Additional columns can be defined, one per row, and added to the Columns table.


[D10,27] A Primary_Key 301 constraint means that each row in a table contains distinct values in the combined set of columns which are collectively labeled as Primary_Key 301 columns for that table.


[D10,30] Each row has a distinct value in this set of columns from that of all other rows.


[D10,48] Therefore all columns in the metadata database named "Domain" 223 satisfy the Foreign_Key constraint shown in row 2 of the Foreign_Keys table 320.


[D11,17] Eventually, after partitioning has been iterated sufficiently, partitions result that are empty of information, or that have only one row in specified processing columns 402.


[D17,63] The stream of phrases 1203 may come from a single row in a single column, or it may come from the database nodes 503 that are labeled as being on the marked solution subtree 504.


[D19,16] Embodied functions can add a row to a table, delete a row from a table, or change a row in a table.


Simply lexically analyzing the claim vocabulary is sufficient, as shown above, to extract the full context of a long, wordy specification into just those references that relate to the rare words used in the claim, after the two-three dozen words have been used to partition the claim into elements.  


But in any case, the VIEWER (a person) must interpret even the simplest of these words.  Therefore, by John’s definition, those words comprise rehetoric, not logic!


That definition doesn’t hunt!  It is the viewer ALWYS who interprets the words to have useful meanings. 


The next question of just how many meanings a given word must have is not significant to the task of interpreting patent specifications.  Word definitions are settled in court in a Markman hearing which puts the proper interpretation into the rare words so that the two sides can debate the technology issues relevant to the patent.  


It isn’t necessary to list the multiplicity of meanings of rare words to get that task completed satisfactorily.  






Rich Cooper


Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of David Eddy
Sent: Saturday, June 11, 2011 1:59 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Run, put, and set


Rich -


On 2011-06-11, at 3:31 PM, Rich Cooper wrote:

In my opinion, there is no abstract language; there is only what you have designated in a previous email as “rhetoric”.  ALL language is messaging from one person to at least one other person, perhaps even the same person reasoning with himself over past experience and current issues.


As is my context...



Not all messaging/language is person-to-person.  There is also person-to-machine/software.


In the example above, there is only a machine to process the Word file which contains the patent.  That is how the patent gets represented in the database, how patents get searched and compared with other patents, and how patent litigation and prosecution are organized.  


I believe that in the person-to-machine "dialogue" the only concern is to have a unique string of characters within a "namespace."


That is not the ONLY concern IMHO.  And that, specifically, is something I object to – the SW uses lexical uniqueness to imply meaning, but I think that use is ineffective for most applications of linguistically competent software.  


A person is using software (that they may or may not understand the purpose of) & the software makes demands for say name length.


One of the more fun ones is the classic 8.3 style off MS-DOS, which one can still see in play in people's email addresses.  It is not unusual for someone to work at an organization that still has in place some sort of email (registration?) system that restricts the left hand name to 8 characters.  With only 8 characters available, things can quickly lurch to the cryptic.


Agreed, but today’s software is not limited by storage sizes as MSDOS software was, and 8 character name constraints are a thing of the past.  


Such "artificial" constraints are obviously long past... yet still in active use.



I would argue these are also a form of abstract language... certainly a form of language that many people have to wrestle with daily.


I disagree that these example comprise ABSTRACT language.  As I explained above, it is only the lexical scoping of the claim into elements that is needed to partition the basic claim sentence.  The abstraction of Noun, Adjective, Subject, preposition, etc need not even be considered for this  purpose, yet those are the fundamental objects which linguists use to process language.  


Thanks for your inputs,




David Eddy


Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>