[Top] [All Lists]

Re: [ontolog-forum] Ontology of Rough Sets

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Rich Cooper" <rich@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 19 Jan 2011 20:28:32 -0800
Message-id: <20110120042836.4B0BE138CE6@xxxxxxxxxxxxxxxxx>

Hi John,


In essence, you are correct - data mining is not sufficient, but it does provide a very capable way of doing the OBSERVATION part of semantic discovery. 


It's true that data mining only finds patterns, not causal or even necessarily inferentiable ones, just patterns that repeat in the data. 


There is no pattern in the data itself – the observer actively discovers patterns.  The important issue for pattern discovery is validation - does the pattern usefully identify anything that can be observed in proxy?  For that, one must validate some model of the pattern, in its actual variety, on actual instances of data – observations from reality – not instances from the model.  The data is about reality, the model is a useful association with reality.  


We use OBSERVATION processes to try to identify and pair each pattern to some meaningful objects, activities or events in the universe of discourse so they can be understood in the context we found them in.  That presumes some concept of classes which gets richer as we pursue validated activities.  Each activity refines the definition of classes so far applied, with the newly added refinements added judiciously to the model.  


We use CLASSIFICATION to organize the patterns, and pattern components, into classes that help us remember the characteristics of whole groups of pattern instances.  That refines our ability to do valid identification.  Formal concept analysis (FCA) as you mentioned is one algorithm among those available for implementing this.  


The THEORIZATION step involves describing WHAT the patterns, and instances of patterns, MEAN – the context as viewed by an observer, again represented by proxy.  That may not be anything at all, or it may bring useful insights about the world.  Theorizing is possible to do completely automatically, but much of this step’s effectiveness requires the kind of insight that people do well and algorithms haven’t so far.  Computer assistance is a leverage factor for some applications.  Designing relevant heuristic functions for each domain may be useful.  


With theories, classes and observations, we can design and carry out EXPERIMENTATION steps to confirm or deny the theories, or to create new observations to drive classification and theorization.  Experiments provide feedback to refine observations, theories and classes, or even to suggest new ones.  


All four processes in a text discovery project - experimenting, observing, theorizing and classifying - are used in addition to data mining to get the full results.  Figures 13 and 14 in the patent describe the processes involving Experimenting, Classifying, Observing and Theorizing. 


The patent figures (Figs 13 and 14) are too large to show in this list due to email size restrictions.  If you would like to investigate the description of the four processes of discovery, see




and read the sections on Figures 13 and 14, which describe it in detail. 






Rich Cooper


Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

-----Original Message-----

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John F. Sowa

Sent: Wednesday, January 19, 2011 5:49 AM

To: ontolog-forum@xxxxxxxxxxxxxxxx

Subject: Re: [ontolog-forum] Ontology of Rough Sets


On 1/19/2011 12:01 AM, Rich Cooper wrote:

> But instances DO define the types, WITH/USING pattern proxies.

> That works well, in text mining, linguistics, and the social and medical

> sciences.


> I think it's the distinction between empirical sciences (instances to types)

> and so called pure sciences, based on very limited views of reality (types

> to instances).


This gets into critical issues about scientific methodology and

the problems with blindly using the results of data mining.


What you get from data mining is *not* a definition of a new type

from instances.  What you get is a "co-occurrence pattern" or

a "correlation".  Such patterns can often be clues to a useful

type definition, but they can often be misleading or worse.


That's why scientists don't accept an observed correlation as

a law until it has (a) made reliable predictions of future

observations, and (b) has been connected by reasonable chains

of inference to other established laws.


But I'll grant that for many applications, such as sending

junk mail, the cost of testing the correlation is higher than

the cost of dumping unwanted mail on people for whom the

prediction fails.



Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>