In developing ontologies to match corpora samples, as in learning algorithms, what kind of analysis of each document would be useful to compare one patent claim against that patent’s description, and against an arbitrary potential prior art candidate?
Entity recognition, with and without names or descriptions or anaphora;
Objects and activities mentioned in the claims, as compared to those mentioned in each patent; Mereological relationships among the identified objects and activities;
Common verb signature database with identified variables and constants,
Modus ponens interpreter of signature phrases wrt the identified objects and activities,
Logic language of FOL level, Horne clause, lexical scopes, question answering,
Heuristic search through And/or graphs with FOL parameterization, simple algebra
What have I missed?
The idea, or long term goal, is to build an ontology of patent claims as encountered in published patents. If that turns out to be helpful, other document analysis tasks might benefit from the ontology so developed.
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
From: corpora-bounces@xxxxxx [mailto:corpora-bounces@xxxxxx] On Behalf Of Rich Cooper
Sent: Friday, August 08, 2014 11:12 AM
To: 'John F Sowa'; corpora@xxxxxx
Cc: '[ontolog-forum] '
Subject: [Corpora-List] What support should a corpus provide?
Dear Corpus Analysts and Ontologists,
I have just made available a corpus of documents from the US Patent and Trademark Office which are available for corpus analysts. The tools available now are sufficient for supporting attorneys, inventors, scientists, and other similar application legal and technology roles.
What additional support should I provide in the software for supporting corpus analysis of selected patent document subsets? I have a web site with extensive help and tutorial materials – I suggest starting at:
www.EnglishLogicKernel.com/Help/help.htm
to see an index of capability descriptions. I can make available the “frequent words” and the “rare words” lists as text files, along with the patent documents in whole or in sections for data, abstract, description and claims, which are already extracted from the selected document set. The claim tree is parsed, and the claims are separated into claim elements, all of which can be provided.
Is there anything else that corpus analysts would like to see in the software?
Suggestions highly appreciated,
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2