Friday, 8 November 2013

The Explorer Delusion - We need to conquer theory evaluation and strong inference, not new continents

[This is an edited version of a previous post]

"It is the theory that decides what may be observed"
- Einstein (quoted by Heisenberg)


We are discussing theory... finally! 

The context in which theory is discussed is mostly its potential relevance for finding more replicable effects (see Twitter, Blog posts +Rolf Zwaan+Michael Kraus+Andrew Wilson and the commentaries on those posts). I've seen statements like methods are more important than theory, theory can be useful sometimestheory and method inform each other and even theory is essential... 

The thing is... 

It takes a long time, but try staring at a method for a while, examine it and ask "why?" like a 5-year old, over and over... at some point you will realise the truth: There is no method! Only theory.

Any effect discovered by a method is in fact a prediction by a theory. It's probably more precise to say that a measurement context is predicted by the formalism within which a theory is defined. The effect is something that may be expected to emerge as a pattern in the data measured in a specific context, dictated by a formalism or theoretical framework.

Most of mainstream psychological science uses sample based statistics and probability theory as tools of inference (e.g., Bosman, Cox, Hasselman & Wijnants, 2013). These tools were developed to study ergodic systems and assume classical physical measurements apply to living systems (e.g., property attribution by measurement outcome). 

That's theory/formalism.

Next time you run an analysis of variance on a sample of subjects randomly assigned to conditions of a factor, ask yourself why it is you are looking for sources of unique variance in your data that are attributable to efficient causes (the levels of the factor you manipulated)? 

You do this because you assume ergodic theory applies to human beings and therefore you can use probability theory on the sample based statistics of which you assume they are an aggregate of properties of the individuals that make up the sample, because you assume you are performing a classical physical measurement.

All the decisions you made to design the measurement context in which you collected your data were guided by theory. Those assumptions usually escape falsification however (but that's another story)

The Explorer Delusion

If you believe it is possible a true effect can somehow be discovered, out there in reality, like a land mass across the ocean where everyone said there would be dragons, or a new species of silicon based life forms at the other end of the worm-hole, then you show one of the symptoms of participating in a failing system of theory evaluation and revision that I dubbed the explorer delusion

This refers to the belief expressed by many experimental psychological scientists that the purpose of scientific inquiry is to go where no man has gone before and observe the phenomena that are “out there” in reality waiting to be uncovered by clever experimental manipulation and perhaps some more arbitrary poking about as well. 

A laboratory experiment is however not a field study or an excursion beyond the neutral zone. Even if it were, I would argue that wherever you go as a scientist, boldly, or otherwise, you will be guided and quite possible even be blinded, by a theory or a mathematical formalism about reality that is in most cases implicitly present in your theorising.

Let's analyse this delusion by scrutinising a recent paper by Greenwald (2012) entitled: “There is nothing so theoretical as a good method”, which is a reference to the famous quote by a giant of psychological science, Kurt Lewin (1951). This also allows me to comment on what it actually is that Platt meant to say by the term "strong inference" in his 1964 paper.

Greenwald is explicit about his position towards theory; he is not anti-theoretic, as he acknowledges that theories achieve parsimonious understanding and guide useful applications (but he does not specify… of what?). The author is however also skeptical of theory, because he noticed the ability of theory to restrict open-mindedness. This is indeed a proper description of a theory: It is a specific tunnel-vision, but from the perspective of the Structural Realist (forgive me, I will explain this position more  precisely in the near future), this tunnel-vision is is only temporary. It will be no surprise I disagree with the following: 

“When alternative theories contest the interpretation of an interesting finding, researchers are drawn like moths to flame. J. R. Platt (1964) gave the approving label “strong inference” to experiments that were designed as crucial empirical confrontations between theories that competed to explain a compellingly interesting empirical result.” (Greenwald, 2012, pp. 99–100, emphasis added)

That is not at all what Platt meant by strong inference, but incidentally we find another symptom of a failing system of theory evaluation, the interpretation fallacy: Theories do not compete for their ability to provide an understandable description or explanation of empirical phenomena. They compete for the ability to predict measurement contexts in which phenomena may be observed and they compete for the accuracy with which measurement outcomes were predicted. And J.R. Platt agrees with this perspective as he describes very clearly:

“Strong inference consists of applying the following steps to every problem in science, formally and explicitly and regularly
1) Devising alternative hypotheses;
2) Devising a crucial experiment (or several of them), with alternative possible outcomes, each of which will, as nearly as possible, exclude one or more of the hypotheses; 
3) Carrying out the experiment so as to get a clean result; 
1') Recycling the procedure, making subhypotheses or sequential hypotheses to refine the possibilities that remain; and so on.” 
(Platt, 1964, p. 347, emphasis added)

Strong inference starts with devising alternative hypotheses to a problem in science and not with an interesting finding. Platt comments that step 1 and 2 require intellectual invention, which I take the liberty to translate as ‘theorizing about reality’. That is what you do when you device a method.

Science does not test whether posited ontology is true

One source of evidence for his argument concern 13 papers, listed in a table that have started controversies on average 44 years ago in psychological science, but which still have no resolution. The author claims that in order to resolve the controversies, the method of strong inference was applied, which obviously failed. Also, it is claimed that philosophy of science provides no answers to resolve the controversies, because it discusses (apparently endlessly) whether such issues can be resolved empirically in principle. It is clear that Greenwald is referring to the resolution of these controversies as a resolution about the ‘reality’ of the ontology of a theory. This is again a matter of interpretation and is not what formal theory evaluation is about. The constituents of reality posited to exist by a theory are irrelevant in theory evaluation. As long as everything behaves according to the predictions by the theory, we should just accept those constituents as temporary vehicles for understanding. I believe these controversy theories were not properly evaluated for their predictive power and empirical accuracy. I don't know if they can be evaluated in that way, if they cannot, the conclusion must be the theories are trivial.

This impression that ontology evaluation seems to be the problem here is indeed supported by the descriptions provided for the 13 controversies: It is primarily a list of clashes of ontology, e.g., Spreading activation vs. Compound cueing. Further support comes from the examples provided to argue that even if philosophy had an answer, this would not refrain scientists to continue the debate. The fact that scientists do not do this implies to the author there must be another way than strong inference to resolve controversies in science. This is illustrated by examples in which a scientific community was able to achieve consensus about a problem in their discipline (the classification of Pluto as a dwarf planet, HIV as the cause of AIDS and the influence of human activity on global warming). The author suggests that controversies in psychology could be resolved if only a reasonable consensus could be achieved.

I cannot disagree with the author on his wish for a science that worked towards reaching consensus about the phenomena in its empirical record, instead of wasting energy on definite existence proofs for the ontologies of competing theories. Recall the history of the quantum formalism, two very different theoretical descriptions of reality (waves vs. particle ontologies) were found to be the same for all intents and purposes. I am certain that scientists in cosmology, virology and climatology used strong inference to work towards those consensus resolutions, but I did not check it. Strong inference and consensus formalism science go hand in hand.

What I can say is that Platt’s recycling procedure (step 1’) suggests replication attempts should be carried out and apparently there is somewhat of a problem with replication of phenomena in psychological science. So this makes it again very unlikely any strong inference has been applied to resolve theoretical disputes in psychological science. Indeed, one of the authors listed to have caused a controversy that was unresolved by strong inference, recently challenged the discipline to start replicating the ‘interesting findings’ in its empirical record (e.g. Yong, 2012).

(There must be some proverb about dismissing something before its merits have been properly examined...)

A second source of evidence to support his suspicions about the benefits of theorising, Greenwald examines the description of Nobel Prizes for their being rewarded due to theoretical or methodological contributions. The explorer delusion is obvious here; Greenwald highly values the appearance of the word ‘discovery’: 
“Most “discovery” citations were for methods that permitted previously impossible observations, but in a minority of these, “discovery” indicated a theoretical contribution.” 
He concludes that theory was important for the development of methods, and that novel methods produced inconceivable results, that prompted new theory.

I am quite certain that the referred inconceivable results were predicted by a theory or considered as an alternative hypothesis. They concern measurement contexts one just does not accidentally stumble upon. If outcomes were surprising given the predicted context, an anomaly to the theory was found, and in that case, naturally, a new theory would have to be created. It was however due to an anomaly to a theoretical prediction, not due to a ‘discovery’ of a phenomenon by a method! The Large Hadron Collider (or any other billion-dollar instrument of modern physics) was not built as a method, a vehicle to seek out previously unknown phenomena like the starship U.S.S Enterprise. Theory, very strongly predicted a measurement context in which a boson should be observable that completed the standard model of particle physics. The methods scientists use for obtaining knowledge about the structure of reality is the result of testing predictions by theories, without exception. Satellites are not sent into space equipped with multi-million dollar X-ray detectors just to see what they will find when they get there. 

Æther-dragging vs. Social Priming

I conclude by commenting on the way the author describes why Michelson won the Nobel Prize for Physics in 1907. This involves a recurring theme in a paper I am about to submit: The luminiferous Æther. Experimental physicists like Michelson and Morley spent most of their academic careers (and most of their money) on experiments that tested the empirical accuracy of theories that predicted a very specific observable phenomenon called Æther-dragging. Their most famous experiment reported in “On the Relative Motion of the Earth and the Luminiferous Ether” (Michelson & Morley, 1887), showed very accurately and consistently that there was no such thing as an Æther, or at least, that its influence on light and matter was not as large as the Æther-dragging hypothesis predicted it would be. This of course harmed the precision and accuracy of Æther-based theories of the cosmos, but to hint, as Greenwald seems to do, that the method ‘caused’ Einstein to create special relativity theory is farfetched.

Michelson won the Nobel Prize for Physics in 1907 for the very consistent null-result (yes psychological science, such things can be important) and for the development of the interferometer instruments that meticulously failed to measure any trace of the Æther (cf.Michelson, 1881). Their commitment to the Æther was adamant though. To be absolutely certain that the minute interferences that were occasionally measured were indeed due to measurement error, instruments of increasing accuracy and sensitivity were built. The largest were many meters wide and placed on high altitude on heavy slabs of marble floating on quicksilver in order to avoid vibrations interfering with the measurement process. Now that is a display of ontological commitment! It was however as much motivated by theoretical prediction as the construction of the Large Hadron Collider. Not a theory-less discovery by some clever poking about.

Is there an analogous example of the severe tests of the Æther dragging effect for the tests that Social Priming effects have been put through?

Greenwald admits that the word theory is often used in Michelson and Morley’s 1885 article, so theory must have played an important role in the design of the instruments. The role was not just 'important', without the theory there would have been no method at all. In fact, if a theory of special relativity had been published 20 years before 1905 (physicists knew something like relativity was necessary), there would have been no instruments constructed at all because:
"Whether the ether exists or not matters little - let us leave that to the metaphysicians; what is essential for us is, that everything happens as if it existed, and that this hypothesis is found to be suitable for the explanation of phenomena. After all, have we any other reason for believing in the existence of material objects? That, too, is only a convenient hypothesis; only, it will never cease to be so, while some day, no doubt, the ether will be thrown aside as useless." (Poincaré, 1889/1905, p. 211). 

And indeed, the Æther  was thrown aside as useless, because a method devised to test a prediction by a theory yielded null results. Strong inference means this repeated null-result has consequences for the credibility of the theory that predicted the phenomenon. Apparently, in psychological science, this id a difficult condition to achieve.

The Structural Realist's take home message is: 

  1. We should believe what scientific theories tell us about the structure of the unobservable world, but
  2. We should be skeptical about what they tell us about the posited ontology of the unobservable world. 
In this quote by Poincaré may lie the answer to Greenwald's interpretation of current practice of psychological science (which is in fact a very accurate description of the problems we have with theory evaluation, I just do not agree with the interpretation): Why does Poincaré reserve a special place for the hypothesis about material objects, which will never cease to to be so? 

Still believe it is possible to use a method that was not predicted to yield measurement outcomes by a theory about reality? 

I'll think of some more examples (again).

Greenwald, A. G. (2012). There Is Nothing So Theoretical as a Good Method. Perspectiveson Psychological Science, 7(2), 99–108. doi:10.1177/1745691611434210
Michelson, A. . (1881). The Relative Motion of the Earth and the Luminiferous Ether. American Journal of Science, 22(128), 120–129. Retrieved from
Michelson, A. ., & Morley, E. W. (1887). On the Relative Motion of the Earth and the Luminiferous Ether. American Journal of Science, 34(203), 333–345. Retrieved from
Platt, J. (1964). Strong Inference. Science, 146(3642), 347–353. Retrieved from Inference (Platt).pdf
Poincaré, H. (1905). Science and Hypothesis. New York: The Walter Scott Publishing Co., LTD. Retrieved from
Yong, E. (2012). Nobel laureate challenges psychologists to clean up their act. Nature. Retrieved from

No comments:

Post a Comment