[Top] [All Lists]

Re: [ontolog-forum] Science, Statistics and Ontology

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Ali SH <asaegyn+out@xxxxxxxxx>
Date: Thu, 10 Nov 2011 21:25:00 -0500
Message-id: <CADr70E2f3=iM5v_pd3gAr1RF3mcPBtux2ByFi3Wz3jEYfdb2FQ@xxxxxxxxxxxxxx>
Dear Len,

Thanks for the feedback.

On Thu, Nov 10, 2011 at 6:08 PM, Len Yabloko <lenyabloko@xxxxxxxxx> wrote:
Thank-you for the post. I am not "ontologist" but ontological questions are inevitable if one is to make any sense of reality. It is also inevitable in any scientific method. The real issue is efficiency and that was always an issue with data analysis.

I'm not entirely sure how you came to this conclusion from this article. If I were to distil the article into three broad themes (incidentally, summarized quite well here http://xkcd.com/882/ ), it'd be:
  1. Misunderstanding the theory behind the statistics
  2. Incorrectly combining hypotheses (especially from a statistical perspective - but it rests on semantic misinterpretations)
  3. Incorrectly revising hypotheses
Efficiency to me seems more ancillary than the semantic concerns in each. Admittedly, (1) arises from "just" misunderstanding and misapplying the basic theories underpinning entire experimental regimes. But (2) is really about incompletely combining two or more distinct but overlapping conceptualizations (the experimental design and the hypothesis).

The nature itself seem to have opted for statistical processing first and logic following from it. The two sides can never be completely separated but the balance is a matter of efficiency - not principle.

Could you elaborate what you mean by efficiency in this context? I certainly agree that there are a multitude of other issues at play. 

However, one area which the article highlighted were some of the difficulties in combining hypotheses, whereby the meta-analysis must integrate different protocols, methodologies and vocabularies. When one attempts to statistically combine a set of experiments that are based on testing for a null hypothesis, it is important that the vocabularies, methodologies and even hypotheses in each experimental set up be correctly aligned. This point was a takeaway for me, when they wrote:

[quote] For one thing, all the studies conducted on the drug must be included — published and unpublished. And all the studies should have been performed in a similar way, using the same protocols, definitions, types of patients and doses. When combining studies with differences, it is necessary first to show that those differences would not affect the analysis, Goodman notes, but that seldom happens. “That’s not a formal part of most meta-analyses,” he says.
“Across the trials, there was no standard method for identifying or validating outcomes; events ... may have been missed or misclassified,” Bruce Psaty and Curt Furberg wrote in an editorial accompanying the New England Journal report. “A few events either way might have changed the findings.” 

So there's the issue of clearly understanding what the implications are from each experiment and data individually, and then trying to tease out in what ways these accounts can be combined. They also make a point that logicians know well - unpublished results or failures are just as important in trying to determine the models that satisfy a theory.

In a very literal (though implicit) way, each lab is deploying its own ontology regarding what procedures make sense (and what those commitments entail), but also what variables are thought significant, what they tried to control for and what was out of the scope of the experiment (and why). It quickly becomes very messy when trying to integrate results across two (let alone multiple) labs, unless one is directly replicating an experiment.  In some cases, each hypothesis may carry implicit assumptions that are reflected in how the experiment is ultimately performed.

In cases where one is not replicating an experiment but trying to extend a hypothesis or is allowing for other factors, then these differences across the experiments need to be accounted for. At the very least, accounted to minimum level of analysis which reveals their statistical implications.

To some extent, this is being addressed for certain cultural subgroups in various parts of science, with the establishment of a variety of protocol libraries or standards. Indeed, in some cases, ontologies are being developed to help align protocols [1], [2], [3], [4]. But it's often taken for granted that people are correctly interpreting the intent of authors through the papers that are being published. Imo, as this article points out, many errors abound and many practitioners conduct poor semantic mappings across experiments in the same domain.

Lastly, especially in the age of micro-publishing, when novel data are published that "break" an existing ontology, and/or call for a revision, perhaps the ontology update cycle is better served through Bayesian revision. This one seems to be applicable on more of a case-by-case basis, but the principle seems worth investigating.

[1] Michel Kinsy, Zoé Lacroix, Christophe Legendre, Piotr Wlodarczyk and Nadia Yacoubi. ProtocolDB: Storing Scientific Protocols with a Domain Ontology. In WISE 2007 Workshops, 2007
[3] Maccagnan A, Riva M, Feltrin E, Simionati B, Vardanega T, Valle G, Cannata N. Combining ontologies and workflows to design formal protocols for biological laboratories. Automated Experimentation, 2010.
[4] Larisa N. Soldatova, Wayne Aubrey, Ross D. King and Amanda Clare. Combining ontologies and workflows to design formal protocols for biological laboratories. Bioinformatics, 2008.

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>