Re: [ontolog-forum] [Corpora-List] What support should a corpus provide

To:	<corpora@xxxxxx>
Cc:	"'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From:	"Rich Cooper" <rich@xxxxxxxxxxxxxxxxxxxxxx>
Date:	Sat, 9 Aug 2014 17:44:29 -0700
Message-id:	<!&!AAAAAAAAAAAYAAAAAAAAAAb3x6NyrzVKo6ReWvn+7BjCgAAAEAAAAHNttV063X5MiD6eNgeEFZMBAAAAAA==@xxxxxxxxxxxxxxxxxxxxxx>

In developing ontologies to match corpora samples, as in learning algorithms, what kind of analysis of each document would be useful to compare one patent claim against that patent’s description, and against an arbitrary potential prior art candidate?

Entity recognition, with and without names or descriptions or anaphora;

Objects and activities mentioned in the claims, as compared to those mentioned in each patent; Mereological relationships among the identified objects and activities;

Common verb signature database with identified variables and constants,

Modus ponens interpreter of signature phrases wrt the identified objects and activities,

Logic language of FOL level, Horne clause, lexical scopes, question answering,

Heuristic search through And/or graphs with FOL parameterization, simple algebra

What have I missed?

The idea, or long term goal, is to build an ontology of patent claims as encountered in published patents. If that turns out to be helpful, other document analysis tasks might benefit from the ontology so developed.

-Rich

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

From: corpora-bounces@xxxxxx [mailto:corpora-bounces@xxxxxx] On Behalf Of Rich Cooper
Sent: Friday, August 08, 2014 11:12 AM
To: 'John F Sowa'; corpora@xxxxxx
Cc: '[ontolog-forum] '
Subject: [Corpora-List] What support should a corpus provide?

Dear Corpus Analysts and Ontologists,

I have just made available a corpus of documents from the US Patent and Trademark Office which are available for corpus analysts. The tools available now are sufficient for supporting attorneys, inventors, scientists, and other similar application legal and technology roles.

What additional support should I provide in the software for supporting corpus analysis of selected patent document subsets? I have a web site with extensive help and tutorial materials – I suggest starting at:

www.EnglishLogicKernel.com/Help/help.htm

to see an index of capability descriptions. I can make available the “frequent words” and the “rare words” lists as text files, along with the patent documents in whole or in sections for data, abstract, description and claims, which are already extracted from the selected document set. The claim tree is parsed, and the claims are separated into claim elements, all of which can be provided.

Is there anything else that corpus analysts would like to see in the software?

Suggestions highly appreciated,