[ontolog-forum] What support should a corpus provide?

Date: Fri, 8 Aug 2014 11:12:00 -0700
Dear Corpus Analysts and Ontologists,


I have just made available a corpus of documents from the US Patent and Trademark Office which are available for corpus analysts.  The tools available now are sufficient for supporting attorneys, inventors, scientists, and other similar application legal and technology roles. 


What additional support should I provide in the software for supporting corpus analysis of selected patent document subsets?  I have a web site with extensive help and tutorial materials – I suggest starting at:




to see an index of capability descriptions.  I can make available the “frequent words” and the “rare words” lists as text files, along with the patent documents in whole or in sections for data, abstract, description and claims, which are already extracted from the selected document set.  The claim tree is parsed, and the claims are separated into claim elements, all of which can be provided. 


Is there anything else that corpus analysts would like to see in the software?


Suggestions highly appreciated,




Rich Cooper


