Re: [ontolog-forum] Science, Statistics and Ontology

To:	"'Len Yabloko'" <lenyabloko@xxxxxxxxx>, "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From:	"Rich Cooper" <rich@xxxxxxxxxxxxxxxxxxxxxx>
Date:	Sun, 13 Nov 2011 12:21:40 -0800
Message-id:	<AB93C668C08549EFA4E8B2A77E743EAA@Gateway>

Dear Len,

I am not entirely sure who wrote the following quote (Len or Simon?) but I would like to replay it for a moment, from the perspective of a learning algorithm which, unattended, senses the environment and is preprogrammed to perform discovery. Your quote, seen from this perspective, is below:

If I were to distil the article into three broad themes (incidentally, summarized quite well here http://xkcd.com/882/ ), it'd be:

1. Misunderstanding the theory behind the statistics

This one seems to be just a math error, unless you mean also that the errors in this category include searching for a logical combination of sensed categories that can be used in discovery. So unless I misunderstand the point you are making in 1, I consider this category to be fixable in the algorithmic implementation. Please correct me if you meant something deeper than that.

2. Incorrectly combining hypotheses (especially from a statistical perspective - but it rests on semantic misinterpretations)

The only combination to be used in my hypothetical discovery system algorithm would be AND, OR and NOT. All other forms of composition are of course just logical combinations of hypotheses, so it appears that the discovery algorithm would be immune to this category of error as well.

3. Incorrectly revising hypotheses

Again, an algorithm that has been preprogrammed to revise hypotheses in the face of sensed evidence can only try one at a time of the various possible revisions, all constrained to be logically correct. So this error also I find would be eliminated in a discovery algorithm that was properly preprogrammed.

What am I missing in your opinion?

-Rich

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Len Yabloko
Sent: Sunday, November 13, 2011 9:02 AM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] Science, Statistics and Ontology

Dear Len,

Thanks for the feedback.

On Thu, Nov 10, 2011 at 6:08 PM, Len Yabloko <lenyabloko@xxxxxxxxx> wrote:

Ali,

>Thank-you for the post. I am not "ontologist" but ontological questions are inevitable if one is to make any sense of reality. It is also inevitable in any scientific method. The real issue is efficiency and that was always an issue with data analysis.

I'm not entirely sure how you came to this conclusion from this article. If I were to distil the article into three broad themes (incidentally, summarized quite well here http://xkcd.com/882/ ), it'd be:

1. Misunderstanding the theory behind the statistics

2. Incorrectly combining hypotheses (especially from a statistical perspective - but it rests on semantic misinterpretations)

3. Incorrectly revising hypotheses

Efficiency to me seems more ancillary than the semantic concerns in each. Admittedly, (1) arises from "just" misunderstanding and misapplying the basic theories underpinning entire experimental regimes. But (2) is really about incompletely combining two or more distinct but overlapping conceptualizations (the experimental design and the hypothesis).

The nature itself seem to have opted for statistical processing first and logic following from it. The two sides can never be completely separated but the balance is a matter of efficiency - not principle.

Could you elaborate what you mean by efficiency in this context? I certainly agree that there are a multitude of other issues at play.

Dear Ali,

While these problems can be discussed in great technical detail, they are still rooted in the original compromise - replacing explanation of observation in terms of cause and effect - with statistical explanation. The way I see it - this compromise greatly increased efficiency of scientific investigation and engineering beyond what could be analysed as mechanical chain of events. The price however had to be paid of replacing the ontological substance of observation with statistical one. All subsequent development of the scientific method had to recover the ontology from sequences of observations. Never mind that David Hume thought there was not any cause and effect to begin with. The inductive method in science is necessary due to limited deductive power. Kolmogorov's method can be seen as attempt to bring back some of that power in a form of "prior knowledge". But it does not unify (IMHO) the two original methods. This remains to

be a challenge which beyond "fixing" or imprroving some techniques. At the same time oen can consider the human evolution as a natural solution to combining the methods with maximum efficiency dictated by selection.

However, one area which the article highlighted were some of the difficulties in combining hypotheses, whereby the meta-analysis must integrate different protocols, methodologies and vocabularies. When one attempts to statistically combine a set of experiments that are based on testing for a null hypothesis, it is important that the vocabularies, methodologies and even hypotheses in each experimental set up be correctly aligned. This point was a takeaway for me, when they wrote:

[quote] For one thing, all the studies conducted on the drug must be included published and unpublished. And all the studies should have been performed in a similar way, using the same protocols, definitions, types of patients and doses. When combining studies with differences, it is necessary first to show that those differences would not affect the analysis, Goodman notes, but that seldom happens. “That’s not a formal part of most meta-analyses,he says.

...

“Across the trials, there was no standard method for identifying or validating outcomes; events ... may have been missed or misclassified,Bruce Psaty and Curt Furberg wrote in an editorial accompanying the New England Journal report. “A few events either way might have changed the findings.”

So there's the issue of clearly understanding what the implications are from each experiment and data individually, and then trying to tease out in what ways these accounts can be combined. They also make a point that logicians know well - unpublished results or failures are just as important in trying to determine the models that satisfy a theory.

In a very literal (though implicit) way, each lab is deploying its own ontology regarding what procedures make sense (and what those commitments entail), but also what variables are thought significant, what they tried to control for and what was out of the scope of the experiment (and why). It quickly becomes very messy when trying to integrate results across two (let alone multiple) labs, unless one is directly replicating an experiment. In some cases, each hypothesis may carry implicit assumptions that are reflected in how the experiment is ultimately performed.

In cases where one is not replicating an experiment but trying to extend a hypothesis or is allowing for other factors, then these differences across the experiments need to be accounted for. At the very least, accounted to minimum level of analysis which reveals their statistical implications.

To some extent, this is being addressed for certain cultural subgroups in various parts of science, with the establishment of a variety of protocol libraries or standards. Indeed, in some cases, ontologies are being developed to help align protocols [1], [2], [3], [4]. But it's often taken for granted that people are correctly interpreting the intent of authors through the papers that are being published. Imo, as this article points out, many errors abound and many practitioners conduct poor semantic mappings across experiments in the same domain.

Lastly, especially in the age of micro-publishing, when novel data are published that "break" an existing ontology, and/or call for a revision, perhaps the ontology update cycle is better served through Bayesian revision. This one seems to be applicable on more of a case-by-case basis, but the principle seems worth investigating.

[1] Michel Kinsy, Zoé Lacroix, Christophe Legendre, Piotr Wlodarczyk and Nadia Yacoubi. ProtocolDB: Storing Scientific Protocols with a Domain Ontology. In WISE 2007 Workshops, 2007

[2] http://bioinformatics.eas.asu.edu/siteProtocolDB/protocolDB.htm

[3] Maccagnan A, Riva M, Feltrin E, Simionati B, Vardanega T, Valle G, Cannata N. Combining ontologies and workflows to design formal protocols for biological laboratories. Automated Experimentation, 2010.

[4] Larisa N. Soldatova, Wayne Aubrey, Ross D. King and Amanda Clare. Combining ontologies and workflows to design formal protocols for biological laboratories. Bioinformatics, 2008.

_________________________________________________________________

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/

Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/

Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx

Shared Files: http://ontolog.cim3.net/file/

Community Wiki: http://ontolog.cim3.net/wiki/

To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J— ” ”