Re: [ontolog-forum] Run, put, and set

To:	"'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From:	"Rich Cooper" <rich@xxxxxxxxxxxxxxxxxxxxxx>
Date:	Sat, 11 Jun 2011 15:06:21 -0700
Message-id:	<20110611220625.1A3AD138CC4@xxxxxxxxxxxxxxxxx>

Hi David,

Thanks for your thoughts on the subject. You and I have discussed the viability of small vocabularies for application specific linguistic competence before, though not in this depth.

In my view, the small number of key words in a specific context (situation) satisfies that requirement without requiring that a plethora of interpretations has to be understood by the software that processes patents. This simple fact has been demonstrated by the success of key word search when there is NO meaning represented, interpreted or even stored regarding the key word vocabularies.

For example, in analyzing patents, I use the following two-three dozen frequent words to partition the claim sentence into claim elements that can be further syntactically analyzed:

The

Comprise

Comprises

Comprising

Comprised

Each

Either

Every

Exist

Exists

All

And

Plurality

Method

Methods

System

Systems

Said

Wherein

Wherefore

Such

That

and a few others (two-three dozen) that I don’t remember off hand. But that small number of highly frequent words are used in that context are adequate. I don’t have to understand the other words in the claim sentence to be able to organize it into smaller clumps that can be more easily analyzed.

That small vocabulary of two-three dozen or so words works for ALL known patent claims on the USPTO web site. That site must have a high hundred thousand patents, perhaps up to a million patents, that have each been reviewed multiple times, organized into the PTO database format, and presented to the viewer with a simple web interface. There is no need to use other words in partitioning claims into elements other than the two-three dozen or so I use as a personal standard.

After that, additional rare words have much more specialized uses. It may require a hundred thousand words of vocabulary, or even more to represent the vocabulary of ALL patents, but the slice of two dozen or so is adequate for the claim sentence partitioning task. The total number of words used in patent claims is huge. That is because there are typically 10 to 50 claims, averaging about 20 claims as typical.

Consider this claim 13 from the 7,209,923:

I claim: ...

13. A method for extracting contextual information about a plurality of objects and a plurality of activities from a plurality of texts comprising: annotating each said text with metadata information; transforming each said text into one or more database rows that reflect the said annotations through tables and columns of the said database; selecting a number P partitioning columns, said partitioning columns to be used for partitioning said database rows into partitions; each said partition comprising zero or more rows, wherein if one or more rows exist, each said row having the same values, or value ranges, in each row and partitioning column in said partition selecting a number R processing columns of said database to be used in extracting information from said database; said processing columns comprising structured columns and unstructured columns; said unstructured columns including one or more columns containing words or phrases expressed in a language; modeling classes and relationships among said plurality of objects and said plurality of activities described by entries in said database; searching the said partitions for said contextual information based on the modeling; and presenting said contextual information to a user; wherein each of P and R is greater than zero.

Context specific words are much more numerous. The following 34 words are “rare words” which describe the technology being claimed for this patent claim 13:

activities

annotating

annotations

classes

column

columns

contextual

database

expressed

extracting

language

metadata

modeling

objects

partition

partitioning

partitions

phrases

presenting

ranges

reflect

relationships

row

rows

searching

selecting

tables

text

texts

transforming

unstructured

user

Organizing that claim into a structured form based on the twenty or so frequent claim words, it looks more like this:

Note that the “extracting contextual information” step means only that, not something that DEFINES each word in specific terms. That role is up to the patent specification, which explains the meaning of each word by using each one in sentences in the specification.

Therefore the twenty word vocabulary is adequate for constructing context even without defining the semantic meaning of each word, or even of the twenty! All those words are understood by the people who write patents, people who review and rule on patent interpretations, and people who invest in the patents. But the context construed in this way is construed by the VIEWER, not by the vocabulary itself. That small vocabulary is a way of constraining claim sentences to meet the requirements of explaining a method or system in form suitable for readers.

Then, further partitioning the claim elements into specific references to the patent specification, the viewer shows how the words are actually used in context. Here is a sample claim chart for one of those claim elements:

in each row and partitioning column in said partition

selecting a number R processing columns of said database

to be used in extracting information from said database;

The following referenced sentences in the specification include, among others:

[D9,58] Additional columns can be defined, one per row, and added to the Columns table.

[D10,27] A Primary_Key 301 constraint means that each row in a table contains distinct values in the combined set of columns which are collectively labeled as Primary_Key 301 columns for that table.

[D10,30] Each row has a distinct value in this set of columns from that of all other rows.

[D10,48] Therefore all columns in the metadata database named "Domain" 223 satisfy the Foreign_Key constraint shown in row 2 of the Foreign_Keys table 320.

[D11,17] Eventually, after partitioning has been iterated sufficiently, partitions result that are empty of information, or that have only one row in specified processing columns 402.

[D17,63] The stream of phrases 1203 may come from a single row in a single column, or it may come from the database nodes 503 that are labeled as being on the marked solution subtree 504.

[D19,16] Embodied functions can add a row to a table, delete a row from a table, or change a row in a table.

Simply lexically analyzing the claim vocabulary is sufficient, as shown above, to extract the full context of a long, wordy specification into just those references that relate to the rare words used in the claim, after the two-three dozen words have been used to partition the claim into elements.

But in any case, the VIEWER (a person) must interpret even the simplest of these words. Therefore, by John’s definition, those words comprise rehetoric, not logic!

That definition doesn’t hunt! It is the viewer ALWYS who interprets the words to have useful meanings.

The next question of just how many meanings a given word must have is not significant to the task of interpreting patent specifications. Word definitions are settled in court in a Markman hearing which puts the proper interpretation into the rare words so that the two sides can debate the technology issues relevant to the patent.

It isn’t necessary to list the multiplicity of meanings of rare words to get that task completed satisfactorily.

JMHO,

-Rich

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of David Eddy
Sent: Saturday, June 11, 2011 1:59 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Run, put, and set

Rich -

On 2011-06-11, at 3:31 PM, Rich Cooper wrote:

In my opinion, there is no abstract language; there is only what you have designated in a previous email as “rhetoric”. ALL language is messaging from one person to at least one other person, perhaps even the same person reasoning with himself over past experience and current issues.

As is my context...

Not all messaging/language is person-to-person. There is also person-to-machine/software.

In the example above, there is only a machine to process the Word file which contains the patent. That is how the patent gets represented in the database, how patents get searched and compared with other patents, and how patent litigation and prosecution are organized.

I believe that in the person-to-machine "dialogue" the only concern is to have a unique string of characters within a "namespace."

That is not the ONLY concern IMHO. And that, specifically, is something I object to – the SW uses lexical uniqueness to imply meaning, but I think that use is ineffective for most applications of linguistically competent software.

A person is using software (that they may or may not understand the purpose of) & the software makes demands for say name length.

One of the more fun ones is the classic 8.3 style off MS-DOS, which one can still see in play in people's email addresses. It is not unusual for someone to work at an organization that still has in place some sort of email (registration?) system that restricts the left hand name to 8 characters. With only 8 characters available, things can quickly lurch to the cryptic.

Agreed, but today’s software is not limited by storage sizes as MSDOS software was, and 8 character name constraints are a thing of the past.

Such "artificial" constraints are obviously long past... yet still in active use.

I would argue these are also a form of abstract language... certainly a form of language that many people have to wrestle with daily.

I disagree that these example comprise ABSTRACT language. As I explained above, it is only the lexical scoping of the claim into elements that is needed to partition the basic claim sentence. The abstraction of Noun, Adjective, Subject, preposition, etc need not even be considered for this purpose, yet those are the fundamental objects which linguists use to process language.

Thanks for your inputs,

-Rich

___________________

David Eddy

deddy@xxxxxxxxxxxxx


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [ontolog-forum] Run, put, and set, Rich Cooper Re: [ontolog-forum] Run, put, and set, sowa Re: [ontolog-forum] Run, put, and set, Rich Cooper Re: [ontolog-forum] Run, put, and set, David Eddy Re: [ontolog-forum] Run, put, and set, Rich Cooper <= Re: [ontolog-forum] Run, put, and set, sowa Re: [ontolog-forum] Run, put, and set, Rich Cooper Re: [ontolog-forum] Run, put, and set, sowa Re: [ontolog-forum] Run, put, and set, Rich Cooper Re: [ontolog-forum] Run, put, and set, Obrst, Leo J. Re: [ontolog-forum] Run, put, and set, sowa

Previous by Date:	Re: [ontolog-forum] Run, put, and set, David Eddy
Next by Date:	Re: [ontolog-forum] Google, Microsoft, and Yahoo Team Up to Advance Semantic Web, Kingsley Idehen
Previous by Thread:	Re: [ontolog-forum] Run, put, and set, David Eddy
Next by Thread:	Re: [ontolog-forum] Run, put, and set, sowa
Indexes:	[Date] [Thread] [Top] [All Lists]