Friday, 8 November 2013

The Explorer Delusion - We need to conquer theory evaluation and strong inference, not new continents

[This is an edited version of a previous post]

"It is the theory that decides what may be observed"
- Einstein (quoted by Heisenberg)


We are discussing theory... finally! 

The context in which theory is discussed is mostly its potential relevance for finding more replicable effects (see Twitter, Blog posts +Rolf Zwaan+Michael Kraus+Andrew Wilson and the commentaries on those posts). I've seen statements like methods are more important than theory, theory can be useful sometimestheory and method inform each other and even theory is essential... 

The thing is... 

It takes a long time, but try staring at a method for a while, examine it and ask "why?" like a 5-year old, over and over... at some point you will realise the truth: There is no method! Only theory.

Any effect discovered by a method is in fact a prediction by a theory. It's probably more precise to say that a measurement context is predicted by the formalism within which a theory is defined. The effect is something that may be expected to emerge as a pattern in the data measured in a specific context, dictated by a formalism or theoretical framework.

Most of mainstream psychological science uses sample based statistics and probability theory as tools of inference (e.g., Bosman, Cox, Hasselman & Wijnants, 2013). These tools were developed to study ergodic systems and assume classical physical measurements apply to living systems (e.g., property attribution by measurement outcome). 

That's theory/formalism.

Next time you run an analysis of variance on a sample of subjects randomly assigned to conditions of a factor, ask yourself why it is you are looking for sources of unique variance in your data that are attributable to efficient causes (the levels of the factor you manipulated)? 

You do this because you assume ergodic theory applies to human beings and therefore you can use probability theory on the sample based statistics of which you assume they are an aggregate of properties of the individuals that make up the sample, because you assume you are performing a classical physical measurement.

All the decisions you made to design the measurement context in which you collected your data were guided by theory. Those assumptions usually escape falsification however (but that's another story)

The Explorer Delusion

If you believe it is possible a true effect can somehow be discovered, out there in reality, like a land mass across the ocean where everyone said there would be dragons, or a new species of silicon based life forms at the other end of the worm-hole, then you show one of the symptoms of participating in a failing system of theory evaluation and revision that I dubbed the explorer delusion

This refers to the belief expressed by many experimental psychological scientists that the purpose of scientific inquiry is to go where no man has gone before and observe the phenomena that are “out there” in reality waiting to be uncovered by clever experimental manipulation and perhaps some more arbitrary poking about as well. 

A laboratory experiment is however not a field study or an excursion beyond the neutral zone. Even if it were, I would argue that wherever you go as a scientist, boldly, or otherwise, you will be guided and quite possible even be blinded, by a theory or a mathematical formalism about reality that is in most cases implicitly present in your theorising.

Let's analyse this delusion by scrutinising a recent paper by Greenwald (2012) entitled: “There is nothing so theoretical as a good method”, which is a reference to the famous quote by a giant of psychological science, Kurt Lewin (1951). This also allows me to comment on what it actually is that Platt meant to say by the term "strong inference" in his 1964 paper.

Greenwald is explicit about his position towards theory; he is not anti-theoretic, as he acknowledges that theories achieve parsimonious understanding and guide useful applications (but he does not specify… of what?). The author is however also skeptical of theory, because he noticed the ability of theory to restrict open-mindedness. This is indeed a proper description of a theory: It is a specific tunnel-vision, but from the perspective of the Structural Realist (forgive me, I will explain this position more  precisely in the near future), this tunnel-vision is is only temporary. It will be no surprise I disagree with the following: 

“When alternative theories contest the interpretation of an interesting finding, researchers are drawn like moths to flame. J. R. Platt (1964) gave the approving label “strong inference” to experiments that were designed as crucial empirical confrontations between theories that competed to explain a compellingly interesting empirical result.” (Greenwald, 2012, pp. 99–100, emphasis added)

That is not at all what Platt meant by strong inference, but incidentally we find another symptom of a failing system of theory evaluation, the interpretation fallacy: Theories do not compete for their ability to provide an understandable description or explanation of empirical phenomena. They compete for the ability to predict measurement contexts in which phenomena may be observed and they compete for the accuracy with which measurement outcomes were predicted. And J.R. Platt agrees with this perspective as he describes very clearly:

“Strong inference consists of applying the following steps to every problem in science, formally and explicitly and regularly
1) Devising alternative hypotheses;
2) Devising a crucial experiment (or several of them), with alternative possible outcomes, each of which will, as nearly as possible, exclude one or more of the hypotheses; 
3) Carrying out the experiment so as to get a clean result; 
1') Recycling the procedure, making subhypotheses or sequential hypotheses to refine the possibilities that remain; and so on.” 
(Platt, 1964, p. 347, emphasis added)

Strong inference starts with devising alternative hypotheses to a problem in science and not with an interesting finding. Platt comments that step 1 and 2 require intellectual invention, which I take the liberty to translate as ‘theorizing about reality’. That is what you do when you device a method.

Science does not test whether posited ontology is true

One source of evidence for his argument concern 13 papers, listed in a table that have started controversies on average 44 years ago in psychological science, but which still have no resolution. The author claims that in order to resolve the controversies, the method of strong inference was applied, which obviously failed. Also, it is claimed that philosophy of science provides no answers to resolve the controversies, because it discusses (apparently endlessly) whether such issues can be resolved empirically in principle. It is clear that Greenwald is referring to the resolution of these controversies as a resolution about the ‘reality’ of the ontology of a theory. This is again a matter of interpretation and is not what formal theory evaluation is about. The constituents of reality posited to exist by a theory are irrelevant in theory evaluation. As long as everything behaves according to the predictions by the theory, we should just accept those constituents as temporary vehicles for understanding. I believe these controversy theories were not properly evaluated for their predictive power and empirical accuracy. I don't know if they can be evaluated in that way, if they cannot, the conclusion must be the theories are trivial.

This impression that ontology evaluation seems to be the problem here is indeed supported by the descriptions provided for the 13 controversies: It is primarily a list of clashes of ontology, e.g., Spreading activation vs. Compound cueing. Further support comes from the examples provided to argue that even if philosophy had an answer, this would not refrain scientists to continue the debate. The fact that scientists do not do this implies to the author there must be another way than strong inference to resolve controversies in science. This is illustrated by examples in which a scientific community was able to achieve consensus about a problem in their discipline (the classification of Pluto as a dwarf planet, HIV as the cause of AIDS and the influence of human activity on global warming). The author suggests that controversies in psychology could be resolved if only a reasonable consensus could be achieved.

I cannot disagree with the author on his wish for a science that worked towards reaching consensus about the phenomena in its empirical record, instead of wasting energy on definite existence proofs for the ontologies of competing theories. Recall the history of the quantum formalism, two very different theoretical descriptions of reality (waves vs. particle ontologies) were found to be the same for all intents and purposes. I am certain that scientists in cosmology, virology and climatology used strong inference to work towards those consensus resolutions, but I did not check it. Strong inference and consensus formalism science go hand in hand.

What I can say is that Platt’s recycling procedure (step 1’) suggests replication attempts should be carried out and apparently there is somewhat of a problem with replication of phenomena in psychological science. So this makes it again very unlikely any strong inference has been applied to resolve theoretical disputes in psychological science. Indeed, one of the authors listed to have caused a controversy that was unresolved by strong inference, recently challenged the discipline to start replicating the ‘interesting findings’ in its empirical record (e.g. Yong, 2012).

(There must be some proverb about dismissing something before its merits have been properly examined...)

A second source of evidence to support his suspicions about the benefits of theorising, Greenwald examines the description of Nobel Prizes for their being rewarded due to theoretical or methodological contributions. The explorer delusion is obvious here; Greenwald highly values the appearance of the word ‘discovery’: 
“Most “discovery” citations were for methods that permitted previously impossible observations, but in a minority of these, “discovery” indicated a theoretical contribution.” 
He concludes that theory was important for the development of methods, and that novel methods produced inconceivable results, that prompted new theory.

I am quite certain that the referred inconceivable results were predicted by a theory or considered as an alternative hypothesis. They concern measurement contexts one just does not accidentally stumble upon. If outcomes were surprising given the predicted context, an anomaly to the theory was found, and in that case, naturally, a new theory would have to be created. It was however due to an anomaly to a theoretical prediction, not due to a ‘discovery’ of a phenomenon by a method! The Large Hadron Collider (or any other billion-dollar instrument of modern physics) was not built as a method, a vehicle to seek out previously unknown phenomena like the starship U.S.S Enterprise. Theory, very strongly predicted a measurement context in which a boson should be observable that completed the standard model of particle physics. The methods scientists use for obtaining knowledge about the structure of reality is the result of testing predictions by theories, without exception. Satellites are not sent into space equipped with multi-million dollar X-ray detectors just to see what they will find when they get there. 

Æther-dragging vs. Social Priming

I conclude by commenting on the way the author describes why Michelson won the Nobel Prize for Physics in 1907. This involves a recurring theme in a paper I am about to submit: The luminiferous Æther. Experimental physicists like Michelson and Morley spent most of their academic careers (and most of their money) on experiments that tested the empirical accuracy of theories that predicted a very specific observable phenomenon called Æther-dragging. Their most famous experiment reported in “On the Relative Motion of the Earth and the Luminiferous Ether” (Michelson & Morley, 1887), showed very accurately and consistently that there was no such thing as an Æther, or at least, that its influence on light and matter was not as large as the Æther-dragging hypothesis predicted it would be. This of course harmed the precision and accuracy of Æther-based theories of the cosmos, but to hint, as Greenwald seems to do, that the method ‘caused’ Einstein to create special relativity theory is farfetched.

Michelson won the Nobel Prize for Physics in 1907 for the very consistent null-result (yes psychological science, such things can be important) and for the development of the interferometer instruments that meticulously failed to measure any trace of the Æther (cf.Michelson, 1881). Their commitment to the Æther was adamant though. To be absolutely certain that the minute interferences that were occasionally measured were indeed due to measurement error, instruments of increasing accuracy and sensitivity were built. The largest were many meters wide and placed on high altitude on heavy slabs of marble floating on quicksilver in order to avoid vibrations interfering with the measurement process. Now that is a display of ontological commitment! It was however as much motivated by theoretical prediction as the construction of the Large Hadron Collider. Not a theory-less discovery by some clever poking about.

Is there an analogous example of the severe tests of the Æther dragging effect for the tests that Social Priming effects have been put through?

Greenwald admits that the word theory is often used in Michelson and Morley’s 1885 article, so theory must have played an important role in the design of the instruments. The role was not just 'important', without the theory there would have been no method at all. In fact, if a theory of special relativity had been published 20 years before 1905 (physicists knew something like relativity was necessary), there would have been no instruments constructed at all because:
"Whether the ether exists or not matters little - let us leave that to the metaphysicians; what is essential for us is, that everything happens as if it existed, and that this hypothesis is found to be suitable for the explanation of phenomena. After all, have we any other reason for believing in the existence of material objects? That, too, is only a convenient hypothesis; only, it will never cease to be so, while some day, no doubt, the ether will be thrown aside as useless." (Poincaré, 1889/1905, p. 211). 

And indeed, the Æther  was thrown aside as useless, because a method devised to test a prediction by a theory yielded null results. Strong inference means this repeated null-result has consequences for the credibility of the theory that predicted the phenomenon. Apparently, in psychological science, this id a difficult condition to achieve.

The Structural Realist's take home message is: 

  1. We should believe what scientific theories tell us about the structure of the unobservable world, but
  2. We should be skeptical about what they tell us about the posited ontology of the unobservable world. 
In this quote by Poincaré may lie the answer to Greenwald's interpretation of current practice of psychological science (which is in fact a very accurate description of the problems we have with theory evaluation, I just do not agree with the interpretation): Why does Poincaré reserve a special place for the hypothesis about material objects, which will never cease to to be so? 

Still believe it is possible to use a method that was not predicted to yield measurement outcomes by a theory about reality? 

I'll think of some more examples (again).

Greenwald, A. G. (2012). There Is Nothing So Theoretical as a Good Method. Perspectiveson Psychological Science, 7(2), 99–108. doi:10.1177/1745691611434210
Michelson, A. . (1881). The Relative Motion of the Earth and the Luminiferous Ether. American Journal of Science, 22(128), 120–129. Retrieved from
Michelson, A. ., & Morley, E. W. (1887). On the Relative Motion of the Earth and the Luminiferous Ether. American Journal of Science, 34(203), 333–345. Retrieved from
Platt, J. (1964). Strong Inference. Science, 146(3642), 347–353. Retrieved from Inference (Platt).pdf
Poincaré, H. (1905). Science and Hypothesis. New York: The Walter Scott Publishing Co., LTD. Retrieved from
Yong, E. (2012). Nobel laureate challenges psychologists to clean up their act. Nature. Retrieved from

Sunday, 6 October 2013

Respect your elders: Lykken's (1968) correlated ambient noise: Do fractal scaling and violations of the ergodic condition evidence the crud factor?

Lykken (1968) estimated that the “unrelated” molar variables involved in most studies in psychology share 4-5% common variance, meaning, with 0 measurement error a correlation of about .20 can be expected between any one of them. This really depends on the field of inquiry, but it has been suggested that estimates between .15 and .35 are by no means an exaggeration.

The origins of such correlations are debated (and of course disputed), but I consider them as an example of the violation of the ergodic theorems for studying human behaviour and development (Molenaar & Campbell, 2009; Molenaar, 2008). The ergodic condition applies to systems whose current state in a state/phase space (that describes all the theoretically possible states a system could be in), is very weakly, or not at all influenced by its history, or its initial conditions. Hidden Markov models are an example of such systems. These systems have no "memory" for their initial state and formally this means their time averaged trajectories through phase space are about equal to their space averaged trajectories. Given enough time, they will visit all the regions of the phase space (formally there's a difference between phase and state space, which I will ignore here).

For Psychological Science the ergodic assumptions related to probability theory are important: In an ergodic system it does not matter if you measure a property of the system 100 times as a repeated measurement (time average), or, you measure the property of 100 ergodic systems at the same point in time (space average). The latter is of course the sample of participants from which inferences are drawn social science. The former would be repeated measurements within a single subject. In ergodic system, the averages of these different types of measurement would be the same. It does not matter for the expected averaged result whether you roll 1 die 100 times in a row, or 100 dice in 1 throw.

Trick or treat?

Now, the trick question is, do you think such is the case for psychological variables? Would I get the same developmental trajectory if I measured IQ in a single human being each year from 1 to 80 (assuming I have a 0-error, unbiased, IQ measuring instrument and a very long lifespan) as when I would draw a sample of 80 participants aged 1 through 80 and measured their IQ on one occasion. Very few scientists would predict I would obtain the same results, in both situations, but in social science we do act as if such would be the case. To me, any evidence of a system's future state being influenced by a state at a previous point in time (memory), is a violation of the ergodic condition and basically should indicate to a scientist to stop using central tendency measures and sampling theory to infer knowledge about the properties of this system. If you do not want to go that far, but still feel uncomfortable about my IQ example, you should probably accept that there may be some truth to Lykken's suggestion about a default ambient correlation between variables in social science. Simply put, if you walk like a duck, there is a small base expectancy that you will also talk like a duck. 

Another line of evidence revealing that everything is correlated (over time), or "has memory", is of course ubiquitous fractal scaling in repeated measurements of human physiology and performance (e.g., Kello et al., 2010). If measurements are interdependent rather than independent it does not necessarily point to a violation of the ergodic condition, but combined, the two frameworks do predict very different measurement outcomes in certain contexts (e.g., Diniz, et al., 2011). My money is still on the "long memory" interpretation

Based on the lower estimates of Lykken's correlation, the expected difference between any sample-based averages would be about 0.5 standard deviations. The test against a null hypothesis of “no association” is often a test against a “straw man” null hypothesis because it can be known in advance that an assumption of no association at all is false. Therefore, a researcher can maximize his chances to corroborate any weak prediction of association between variables, by making sure a large enough number of data points are collected. You know, those statistical power recommendations you have been hearing about for a while now. A genuine “crud factor” (cf. Meehl, 1990) implies a researcher has a chance of 1 in 4 to evidence an association using a sample size of 100 data points, without even needing a truth-like theory to predict an association or its sign.

Figure 1. A simulation of the effect of sampling from different regions of a population distribution (Npop = 500000) in the presence of a crud factor, a population-level correlation between any two random variables. Each dot represents the number of significant results (p < .05) observed in 100 t-tests for independent groups of the size represented on the x-axis (10 – 100). Two random variables were generated for each population correlation: .1, .2, .3 (columns). One random variable was used to sample data points in the 10th (top row) or 25th (bottom row), and between the 25th and 75th percentile (comparison group). The means concern the aggregated values of the second random variable for each sampled case. The directional hypothesis tested against the null was (M[.25,.75] – M[0,.10]) > 0  or (M[.25,.75] – M[0,.25]) > 0 .

Psychologists need to change the way they theorise about reality

The crud factor, or the violation of the ergodic condition, are not statistical errors that one can resolve by changing the way psychologists analyse their data. It requires adopting a different formalism about measurement of properties of non-ergodic systems, it requires theories that make different kinds of predictions. No worries, such theories already exist and there are social scientists who use them. To encourage others, here are some examples of what can happen if one continues to assume the ergodic condition is valid and use the prediction of signs of associations between variables (or group differences) as the ultimate epistemic tool for inferring scientific knowledge.

Suppose two variables x (e.g., a standardised reading ability test) and y (amount of music training received in childhood) were measured in samples drawn from a population that was cut into regions in order to compare dyslexic readers (the 10th percentile and lower, and the 25th and lower on variable x) and average readers (between the 25th and 75th percentile on variable x) on variable y. The sample size for each group was varied from 10 to 100 data points and 100 tests were performed for each group size. For each test a new random group sample was drawn.

Figure 1 represents the number of significant (p < .05) t tests found in the series of 100 tests conducted for each group size. If the crud factor were .1, then comparing to the samples from the 10th and 25th percentile would yield 25% significant results at group sizes of 44 and 58 data points respectively. The total study sample size would be 88 and 116. At this crud factor level the chances do not get much better than 1 in 4 corroborative events without there being any theory to pat on the back and grant some verisimilitude. When the correlation is .2, 25% significant tests can be expected at group sizes of 12 (10th) and 23 (25th) and at a correlation of .3 it’s 10 (10th) and 12 (25th) participants in each group to find 25% significant differences. The crud factor of .3 even implies that 100% of the conducted tests could give a significant result if the group size is larger than 87 and the dyslexic group is drawn from the 10th percentile of the population distribution of reading ability.

So, what's the use of a priori sample size calculations again? To get a sample size that will allow you to evidence just about anything you can(not) think of, as long as you limit your predictions to signs of associations (Figure 2). A real treat.

Figure 2. Same simulations as described in Figure 1, but for a range of crud factors between 0 and 0.4.


Diniz, A., Wijnants, M.L., Torre, K., Barreiros, J., Crato, N., Bosman, A.M.T., Hasselman, F., Cox, R.F.A., Van Orden, G.C., & Delignières, D. (2011). Contemporary theories of 1/f noise in motor control. Human movement science, 30(5), 889–905. doi:10.1016/j.humov.2010.07.006

Kello, C. T., Brown, G. D. A., Ferrer-i-Cancho, R., Holden, J. G., Linkenkaer-Hansen, K., Rhodes, T., & Van Orden, G. C. (2010). Scaling laws in cognitive sciences. Trends in Cognitive Sciences, 14(5), 223–232. 

Lykken, D. T. (1968). Statistical significance in psychological research. Psychological bulletin, 70(3), 151–9.
Meehl, P. E. (1990). Why Summaries of Research on Psychological Theories Are Often Uninterpretable. Psychological Reports, 66(1), 195. doi:10.2466/PR0.66.1.195-244

Molenaar, P. C. M. (2008). On the Implications of the Classical Ergodic Theorems : Analysis of Developmental Processes has to Focus on Intra-Individual Variation. Developmental Psychobiology, 50(1), 60–69. doi:10.1002/dev

Molenaar, P. C. M., & Campbell, C. G. (2009). The New Person-Specific Paradigm in Psychology. Current Directions in Psychological Science, 18(2), 112–117. doi:10.1111/j.1467-8721.2009.01619.x

Friday, 20 September 2013

Defending Psychology in the Science Wars: Part 2 - Clone of the Attack

If a science is to have fair play, it is well for it if it does not become popular. […] A popular science, one that is "made easy," is not likely to have much vitality in it. The intension of a science is in the inverse ratio to the extension.
– J.H. Balfour Browne, Esq. (1870)

After my previous post (which immediately became the most "popular" post on my blog), some people noticed this is not a regular science blog and I am not a regular science writer. I had to clarify to some I do not aspire to be one. That is, I have no intention to contribute to popularising science via this blog, because that often means science "made easy" and I like my science vital, with just a little intension to produce maximal extension like Browne (1870).

I said somewhere I should have preregistered my expectations about the responses I would get about my exposé of the Phantom Arguments, this was not really necessary, because it was a Clone of the Attack as I described in the little fictional piece, I'll discuss some of it later on.

(There were also compliments! Thank you, I was hoping you would appreciate at least some of my subtext and meta-sarcasm... or is it irony when sarcasm can be used meta?)

There are other reasons to call this Clone of the Attack, for one I do not want young scientists to become the clones of their supervisors as can be evidenced from their responses to the attacks. I will discuss the roots of the Hard/Soft schism that seems to really get the blood boiling for some. First though, the attack was cloned to another target, the Economists are facing the same allegations... not a science. Their response is not very different from the soft psychologists in terms of phantom arguments.

Clone of the Attack: The Economists

This week's installment of "Academics Saying Dumb Shit in the Times": #andIhateeconomics

The link is to an article about the lack of corroboration by prediction in sciences studying the economy. Ok, I have to mention the tone of tweet is not representative, subsequent responses to a question I asked were more thoughtful and academic :) However, the arguments used in defence of Economy are mostly the same phantom arguments as psychologists use. I received a link to this paper: God Gave Physics the Easy Problems: Adapting Social Science to an Unpredictable World from the Journal of International Relations (more on the possibilities for a "hard" science of international relations and conflict later).
This is just the "physics is easy" argument. The authors say the paper is a plea for humility and they want to overcome physics envy. That's very sensible and all fine with me, but they suggest that the ideas about what makes a field of inquiry a genuine science are based on Newtonian physics and  deductive logic and this is the wrong model to study social phenomena. Yes, but there is no physicist or philosopher of science that forced you to use Newtonian physics as a model. In fact, I believe they would advise against it. The fact that there is a belief in social science that Newtonian causality should be used to understand complex phenomena is "Newton's Curse". The very same I described in my previous post. It apparently also applies to the scientific study of International Relations and Economy.

Moreover, the suggestion that there is a definition, or measure in philosophy of science, or meta-theory (the empirical study of scientific theorising) that is somehow based on a model of Newtonian causality is of course completely false and qualifies as Academics Saying Dumb S.. stuff in scientific journals. The quantum formalism: 1930s, General relativity: 1910s. The great modern philosophers who defined what modern science is, Popper, Kuhn, Lakatos based their ideas on modern physics (e.g., Kuhn's interviews with the founding fathers of QM ). Here is Karl Popper reflecting on probability:

"I had always been convinced that the problem of the interpretation of the quantum theory was closely linked with the problem of the interpretation of probability theory in general, and that the Bohr-Heisenberg interpretation was the result of a subjectivist interpretation of probability. My early attempts to base the interpretation of quantum theory upon an objective interpretation of probability (it was the frequency interpretation) had led me to the following results.
(I) The so-called 'problem of the reduction of the wave packet' turns out to be a problem inherent in every probabilistic theory, and creates no special difficulty.
(2) Heisenberg's so-called indeterminacy relations must not be interpreted subjectively, as asserting something about our possible knowledge, or lack of knowledge, but objectively, as scatter-relations. (This removes an asymmetry between p and q which is inherent in Heisenberg's interpretation unless we link it with a phenomenalist or positivist philosophy; see my Logic of Scientific Discovery, p. 451.)
(3) The particles have paths, i.e. momentum and positions, although we cannot predict these, owing to the scatter relations.
(4) This was also the result of the imaginary experiment ('thought- experiment') of Einstein, Podolski, and Rosen.
(5) I also produced an explanation of the interference experiments ('two-slit-experiments'), but I later gave this up as unsatisfactory." Popper (1959, pp. 27-28) 

Very modern. If quantum logic can be a part of "hard" science why not the complex, context sensitive, nonlinear, nested and circular causality studied by the social sciences? My response to the Economists and the International relations scientists who want to use the "physics is easy" argument is the same as my response to the Psychologist, realise that you are saying:
"We are using the wrong tools to study the phenomena we are interested in and we are not planning to work very hard to try and fix it any time soon". 
I'll repeat, if these disciplines are serious about being a more difficult endeavour than physics and serious about being a science, then why don't they teach the methods and tools of quantum physics and general relativity to their undergraduates? That is the kind of formal language and abstract reasoning to depart from when they want to study those complex phenomena. It would be a logical, scientific thing to do: Start learning everything about the scientific methods and tools that made that easy science about dead matter so extremely accurate and successful.

Some authors are more equal than others...

Another thing happened that I expected to be cloned from my post. I complained about disciplines of science in which papers can be rejected because the maths and models are too difficult and expected to receive the very same complaint myself about my post. Several people told me this, some hinted at it by sending a link and one of those was a confession about the phenomenon in economy! Here's an excerpt from the link I was sent:
"What every economist, and for that matter every writer on any subject, needs to realize is that unless you are a powerful person and people are looking for clues about what you’ll do next, nobody has to read what you write — and lecturing them about what they’re missing doesn’t help. You have to provide the hook, the pitch, whatever you want to call it, that pulls them in. It’s part of the job." 
Yes the prophets were terrible authors and I haven't found any hooks or pitches in the Bible, but am deeply annoyed to be lectured all the time on what I am missing.

I thought we were talking about science. Scientists have to read what other scientists write. Scientific papers are not novels and junior scientists should not be selected for tenure based on their communicative skills. In high school, most kids-now-scientist, spent their time practicing other abilities than social skills. This kind of reasoning taken seriously would have probably resulted in discarding each and every one of Einsteins 1905 Miracle Year articles because they weren't pitched right. (as groundbreaking Nobel worthy, legendary contributions to science).

If authors are allowed to ignore parts of the scientific record that may be relevant to their claims, it is no longer a discipline of a science that has as a goal to accumulate veridical knowledge about the structure of reality. It can't have that goal if you can dismiss any article by saying you didn't connect with its message after the first paragraph. Apparently Krugman is a powerful enough person to be allowed to hold this belief and I apparently have to resort to Sesame-street language in the future or be ignored forever. What's next? Social scientists confessing they've never read the old man and the sea (because there are no hooks to be found in that book) so they can ignore the Humanities are a science?

Here is a beautiful description of one of the few cases (in mathematics) in which I could understand a commentary such as Krugman's:

The Paradox of the Proof - By Caroline Chen

Then again Shinichi Mochizuki didn't send in the proof for review if I understand correctly, but mathematicians read and check the things their colleagues' post as working or submitted papers.

Clone of the Attack: Finally Some Evidence for Priming?

I'll be (relatively) short about this, the introduction of my previous post including the short story had a different function than the main part containing the critique of the phantom arguments. Some specific terminology had a function (e.g., "average psychologists" in combination with measurement criticism) as well as choice for a specific style.

 I expected that responses ...
  1. ... would not be calm and erudite (occurred), even though I literally mentioned this is what often happens when you criticise soft psychology.
  2. ... would try to depict me as a member of an outgroup (occurred), even though I am a psychologist myself and even though I described that this is what happens to psychologists who express a non-average psychology opinion in the fictional story. I was even using my own name as the future outcast! 
  3. .... would attack the style (e.g., quality / intelligibility of communication) and ignore the content (occurred) even though I mentioned I consider such a thing unscientific. Or in Meehls words, "intellectually dishonest".
  4. ... would reveal some of my arguments are not understood. (I don't know yet, or does that count as: occurred?)
  5. More commentaries would appear based on arguments (see below) I did not discuss yet in my first post (occurred). But his was of course not due to the previous post, It's just that the there were some phantom arguments left to be used.
  6. ... would accuse me of being rude, making ad hominem attacks (occurred). This was expected because English is not my native tongue and I am Dutch, which can be a dangerous mix. Besides, if I have really been rude, I certainly did not intend to be more rude than the rudeness towards the author of the newspaper article (mentioned by name in all commentaries I responded to) who was basically called a dummy who didn't know S... science. I never used authors' names in my post, instead I addressed "Psychology" as the focus of my critique in my commentaries. I do understand that it was unfair to literally quote just one blogpost and leave the others uncommented. To make up I comment on all of them as an appendix to this post, including new excuses that appeared, the deleted scenes of part 1.

Generally speaking I wanted the younger scientists at the softer end of the spectrum to respond and maybe even to be pissed off. Why? Because we need them, psychological science needs them. However, we don't need them to be clones of their predecessors. They're bright and smart and they can communicate (jay!) and they have to be so much better than their advisors and current superiors ever had to be. We need them to become harder scientists, but in order to do so they really, really have to accept the fact that psychology is as soft as it gets when empirical science is concerned. You do not turn into a "hard" science when you use an MRI scanner or a gene-sequencing technique.

A New Hope... (oh... wait a minute, yep It's in fact an old hope) 
"Selectively using these qualifications as an excuse to exclude psychology and other “soft sciences” (excuse me while I roll my eyes so hard that I risk sending them permanently into the back of my head) from the scientific discipline without questioning the fact that “hard sciences” routinely address topics that are both “unnatural” and “unobservable” is simply lazy." 
"Psychologists like to weigh in on the psychology is a science perspective because we are engaging in upward social comparison--We want a seat at the table with the hard sciences, we want to be published in the most prestigious science journals, and we want a larger share of the grant funding from our government. In contrast, the harder sciences engage in downward social comparison with psychology--Hard sciences seek to maintain their elevated position in the science hierarchy, and sometimes they accomplish this by disparaging the softer sciences."

Unnatural and unobservable (or laziness) have nothing to do with the Hard/Soft divide, nor the "desire" of the hard sciences to maintain an elevated position (that's an odd accusation by the way). When scientists are asked to differentiate between different kinds of science, a consistent classification of disciplines is found along three dimensions: Hard/Soft, Pure/Applied and Life/Non-Life. The Hard/Soft dimension is often used to draw some line between the natural sciences and the rest of science. A consensus description of the divide is provided by Fanelli (2010) who showed 91.5% of psychology / psychiatry papers  report positive findings.

“ […] in some fields of research (which we will henceforth indicate as ‘‘harder’’) data and theories speak more for themselves, whereas in other fields (the ‘‘softer’’) sociological and psychological factors –for example, scientists’ prestige within the community, their political beliefs, their aesthetic preferences, and all other non-cognitive factors– play a greater role in all decisions made in research, from which hypothesis should be tested to how data should be collected, analyzed, interpreted and compared to previous studies.” (Fanelli, 2010, p. e1068)

This does not imply anything about the veracity of scientific claims made in any of those fields. It does imply that for some reason the evaluations of the veracity of scientific claims in the soft sciences are influenced by other factors than just the theory and the data. (e.g., being a famous economist you can disregard literature you believe is badly written). The deceivingly simple and elegant statement “data and theories speak more for themselves”, has been expressed in less elegant varieties in order to classify the social sciences as belonging to the softer fields of scientific research (I finally found the source again, so here is the literal version):

“After reading Meehl [1967] and Lykken [1968] one wonders whether the function of statistical techniques in the social sciences is not primarily to provide a machinery for producing phoney corroborations and thereby a semblance of ‘scientific progress' where in fact, there is nothing but an increase in pseudo-intellectual garbage. [...] Or, as Lykken put it: 'Statistical significance [in psychology] is perhaps the least important attribute of a good experiment; it is never a sufficient condition for claiming that a theory has been usefully corroborated, that a meaningful empirical fact has been established, or that an experimental report ought to be published.' [...] Thus the methodology of research programmes might help us in devising laws for stemming this intellectual pollution which may destroy our cultural environment even earlier than industrial and traffic pollution destroys our physical environment.” (Lakatos, 1975, p. 176, footnote 1, emphasis added)

Lakatos’ strong rejection of the kind of theorizing in social science, which he classifies as one of the worst kinds of ad hockery leading to degenerative research programmes, (however: “we make no mockery of honest ad hockery” Good, 1965) is not new and can be appended to a long list of critiques tracing back to the earliest conceptions of some fields of scientific inquiry. Often some discontent is expressed about the way the softer disciplines pretend to act like a genuine science (e.g., “semblance of scientific progress”). Some authors go further and claim no real efforts are undertaken to become a natural science; as if the soft fields are just dressing up to play “scientist”, putting on thick glasses, smoking pipes and learning Klingon just to give their claims that extra hint of profound intellectual scientific insight (“Cargo Cult Science” criticises the same kind of conduct, Feynman, 1974)

Statistical techniques are indeed often abused to cover up logical weaknesses in theories or to feign exactitude. Moreover, the directional hypothesis test is the weakest prediction on can test. The true nature of this critique is not statistical, but much more profound as it predates the invention of most inferential statistics. For example, Ladd (1892), also quoted in the previous post, reviewing William James’ “The Principles of Psychology” (1890, spanning two volumes) concludes:

“Of the conception of psychology, its nature, problems, and method, which is proposed in these volumes, and of the defence in detail of this conception, the following statements seem to me true: The conception is such, and so narrow, that a consistent adherence to it compels us to admit the utter impossibility of establishing psychology as a natural science. It excludes almost all the really scientific data and conclusions; it includes only those data and conjectures which are most remote from genuine science.” (Ladd, 1892, p. 28, emphasis added)

The “utter impossibility” is explicated when Ladd comments on James’ suggestion that psychology is using the methods of the natural sciences to test deep hypotheses about its object of study when in fact:

“[…] psychology as a science, devoid of all postulating of "deeper-lying entities," does nothing of the kind. It assumes only the phenomena - the thoughts and feelings as actually known, and the possibility of ascertaining uniform relations among them.” (Ladd, 1892, pp. 29–30, emphasis added)

The emphasized passages are relevant and appear in Figure 1. They express key characteristics of scientific theorising still practiced in the soft sciences today that, in my opinion, can no longer be ignored as a major cause of the contemporary problems, soft science produces theories of construction, hard science produces theories of principles. In fact, to explain the previously mentioned high number of positive results reported in the soft sciences, Fanelli offered an explanation:
"1B-Deepness of hypotheses tested
This has been suggested to reflect the level of “maturation” of a science [56]. Younger, less developed fields of research should tend to produce and test hypotheses about observable relationships between variables (“phenomenological” theories) [Ladd 1892: “assumes the phenomena [...] ascertaining uniform relations among them”]. The more a field develops and “matures”, the more it tends to develop and test hypotheses about non-observable phenomena underlying the observed relationships (“mechanistic” theories) . These latter kinds of hypotheses reach deeper levels of reality, are logically stronger, less likely to be true, and are more conclusively testable [56]. [Ladd 1892: postulating of "deeper-lying entities,"]" (Fanelli, 2010, p. e10068, emphasis and quotes added)


I already claimed that Psychology is in fact not so very young and immature, at least, it should not be. It is at least 160 years old as an empirical science and so are many, many modern fields of science, including some of the harder ones. Here's Titchener who in 1893 calls the modern psychologist, the experimental psychologist:

“Modern Psychology surely began, not "three or four years ago," with the publication of the Willenshandlung, –but some forty years ago, with Fechner's notion of the definite functional correlation of psychical with physical processes. The modern psychologist is the experimental psychologist.(Titchener, 1893, p. 456, emphasis added)
Don't think that social psychology wasn't around at the time, listen to what Elwood (1899), has to say:

"Some sort of social psychology, it is true, has usually been assumed by social science; but the plea of this article is for a systematically worked out and carefully verified social psychology as a condition of complete social knowledge. For, if it be assumed that the phenomena of society are chiefly psychical, a knowledge of the psychical processes which characterize group-life as such is manifestly a most important condition of complete social knowledge. A few preliminary statements of position may, however, be helpful in rendering our plea more intelligible. Kulpe  [See Kulpe's Outlines of Psychology, translated by Titchener [1895], p. 7; cf. also the original. 656] speaks of social psychology as the science which "treats of the mental phenomena dependent upon a community of individuals." This we may accept as a rough, working definition of the science. 

Now, the assumption that there are "mental phenomena dependent upon a community of individuals" presupposes psychical processes which are more than merely individual, which are inter-individual; in last analysis it implies that through the action and reaction of individuals in a group upon one another there arise psychical processes which cannot be explained by reference to any or all of the individuals as such, but only by reference to the group-life considered itself as a unity. Social psychology, then, if somewhat more strictly defined, has as its task to examine and explain the form or mechanism of these group psychical processes. It is an interpretation of the psychical processes manifested in the growth and functioning of a group as a unity. " (Elwood, 1899, p. 656, emphasis added)
Elwood describes here, based on earlier work by Kulpe, that social psychology should dispel Newton's curse: Individuals interacting together in a group, act as a whole, a unity whose emergent properties and behaviour cannot be attributed to the behaviour of its individual components. Wholes are not just "causally impotent epiphenomena, i.e. merely aggregates of microphysical constituents.” (van Leeuwen, 2009,  p. 38).

Elwood continues to say that the definition of a whole can be anything from a society to a family and I would say anything more than 1 individual. I believe that what is seeping through here, is one of the  realisations that led to the great advances in physics: It's all about the relations between things (interactions) and not the things themselves:
“The aim of science is not things themselves, as the dogmatists in their simplicity imagine, but the relation between things” (Poincaré, 1905, p. xxiv). 
Psychological science keeps forgetting this fact, and those who point out its importance face banishment to a sub-discipline or -ism of psychology , J.J. Gibson's affordances (inspired by Kurt Lewin's valence), Hebb's reverberation (connectionism), etc. here's Ashby who argued as early as the 1940s to adopt the concept of self-organizing systems in psychology :
“It follows that a substantial part of the theory of organization will be concerned with properties that are not intrinsic to the thing but are relational between observer and thing.” (Ashby 1962, emphasis in original) 
It seems that time in the late 1800s, the true schism into softer and harder fields of science had not not occurred yet and a young modern psychologist could truly claim he was contributing to an immature, but advancing science that studied phenomena of the mind. The schism happened in the first half of the 20th century with the success of General Relativity and Quantum Mechanics. Around the time of the publication of James’ Principles and the defining of the field of Social Psychology there were actually discussions (e.g., Mach, 1891, 1892) on how physics and psychology could mutually inform each other:
“[…] the time has now come when each science should profit from the progress of the other. Physical science can better eliminate errors of observation by learning what is known of their cause and nature. Psychology will gain greatly in clearness and accuracy by using the methods of physics and mathematics.” (Cattell, 1893, p. 285).
I am quite certain contemporary psychological science doesn’t have a lot of scientific knowledge to provide on the cause and nature of anything that would be remotely relevant to the daily scientific work of a modern physicist (save the handful of social scientists who occasionally publish in physics journals, see below, I make a deep bow). About a century ago, people actually thought psychology and physics would go hand in hand... up up the ladder to the top of the hierarchy of sciences. Of course psychology did just that and used the methods of physics and mathematics and as it slowly matured, it gained greatly in clearness and accuracy just like Cattell predicted in 1893.

(damn, wrong timeline again)

Figure 1. Einstein's distinction between theories of construction and theories of principles (see van Dongen 2010, pp. 52-53). Quoted text is from Ladd's 1892 characterisation of William James' proposal for Psychology as an empirical science

How "hard science" do you want your social psychology?

There's a lot more to say about this, and I will at some point, just finished a 50+ page chapter on the subject... :). In short relativity happened and modern cosmology, quantum mechanics happened producing the most accurate predictions about phenomena in the universe ever measured and physics and the harder sciences moved towards consensus formalism science: The theory and the data speak for themselves. No disputes about what priming is, how it should be measured and if you have to take into account whether people ate carrots the night before: Formal language, formal predictions, formal evaluation of precision and accuracy in a joint effort to understand the unobservable structure of the universe.

Funny, isn't it? Despite all the fundamental knowledge about social processes, individual drives and group dynamics the social sciences cannot get their act together and finally start testing some theories based on precision and accuracy instead of cultural conventions, "prestige within the community, their political beliefs, their aesthetic preferences, and all other non-cognitive factors" (Fanelli, 2010). And the nerds can achieve consensus and they get to build spaceships and atom smashers by working together with 10.000 individuals towards the same common goal: A deeper understanding of reality.

I leave you with some references of work by social psychologists who are very much working in the tradition Elwood lined out, but they resort to publishing in Physics and Technology journals, about intractable conflict, individual decisions in Economy, close relationships in social networks and what not. 

(Social) psychology  + hard science = not possible?
International conflict + hard science = not possible?
Economy                  + hard science = not possible?

  (Not what I am saying, this is what people keep telling me)

Ok, read a selection of some recent work by social psychologists NowakVallacher and colleagues and then we'll discuss whether it is possible –in principle– for Social Psychology to be a hard science or not:

Liebovitch, L., Naudot, V., Vallacher, R., Nowak, A, Buiwrzosinska, L., & Coleman, P. (2008). Dynamics of two-actor cooperation–competition conflict models. Physica A: Statistical Mechanics and its Applications387(25), 6360–6378. doi:10.1016/j.physa.2008.07.020

Nowak, A., Kuś, M., Urbaniak, J., & Zarycki, T. (2000). Simulating the coordination of individual economic decisions. Physica A: Statistical Mechanics and its Applications287(3-4), 613–630. doi:10.1016/S0378-4371(00)00397-6

Nowak, A., & Vallacher, R. R. (2003). Synchronization Dynamics in Close Relationships: Coupled Logistic Maps as a Model of Interpersonal Phenomena. In W. Klonowski (Ed.), Frontiers on nonlinear dynamics: Vol. 2. From quanta to societies (pp. 165–180). Berlin: Pabst Science Publishers.

Staab, S., Domingos, P., Mike, P., Golbeck, J., Ding, L., Finin, T., Anupam, J., Nowak, A., Vallacher, R. R. (2005). Social networks applied. Intelligent Systems, IEEE, 20(1), 80–93.

Vallacher, R. R., Coleman, P. T., Nowak, A., & Bui-Wrzosinska, L. (2010). Rethinking intractable conflict: the perspective of dynamical systems. The American Psychologist65(4), 262–78. doi:10.1037/a0019290

(Yes, thats two papers in a physics journal and one IEEE, why not in JPSP I wonder?)

Next part in the saga, I will provide more examples to the attacking physicist to show the study of psychological phenomena is "hard" science scientific. A consequence may be that such studies will not be conducted by psychologists in the future, but at places such as the Google Campus or Boston Dynamics.


"There has long been snobbery in the sciences, with the "hard" ones (physics, chemistry, biology) considering themselves to be more legitimate than the "soft" ones ( psychology, sociology)." (No, they can prove that the theories their community produces about the structure of reality are more precise and more accurate than the theories the soft sciences produce. This is not about "them" being more legitimate, their theories about reality are)
"Many people benefit from the results, including those who, in their ignorance, believe that science is limited to the study of molecules."[proceeds to list examples of "successes of psychology] (None of the critiques are saying science is limited to the study of molecules (a lot happened after Perrin, 1913), they are saying psychological phenomena are not studied in a scientifically rigorous manner. Moreover, the examples proving psychology is a science are proofs of application, the technology based on knowledge provided by the science: Therapy and intervention. That's comparing the effectiveness of cognitive behavioural therapy to treat clinical depression, to the very existence of smartphones, airplanes and the internet. Moreover, the most severe cases of depression still benefit from electroshock therapy and we don't know why. Did psychology help to reduce the rate of teenage pregnancies? The average birth rate for every 1000 adolescents aged 15-19 since 1996 is 7.7 in the Netherlands. In the US the number is 55.5, the highest in the developed world. Does the Netherlands have better psychologists who are not sharing their findings with the US? Ah yes, the achievement gap. Why are there no interventions to close the caucasian-asian gap?) 
"I hope that most people who read Alex Berezow’s editorial in the Los Angeles Times denying that psychology is a science found it misinformed and bordering on absurd." (We are informed this conclusion be drawn based on the article I commented on above. That is odd, because Berezow's editorial was a critique on the article I commented on above, he didn't agree with it and neither do I)
"Unfortunately, there are still people out there who have a distorted and caricatured idea of what psychology is, a problem that Wilson was trying to combat. Sadly, the LA Times found one of these people and gave them editorial space to perpetuate their ignorance." (This is an elaborative way to say Berezow is an uneducated fool and the LA times should have known better to provide such people a stage for the nonsense that comes out of their keyboards. Still no real arguments to back these strong claims. If anyone made caricature of psychology is, it's Wilson's "combat" article)
"Berezow himself does a perfectly good job refuting his own claims when he tells us that his own example is a terrible one: "To be fair, not all psychology research is equally wishy-washy. Some research is far more scientifically rigorous. And the field often yields interesting and important insights." Well said. But let’s put Berezow’s abject ignorance of the empirical methods of psychological research aside for a moment" (Pssst... When he made the exception for some research, he wasn't talking about social psychological research and he did not refute his own claims and he most certainly did not tell any of us that his example was terrible... that's just what this blog author claims)  

"Some people (usually who know little about psychology) argue that psychologists don't define their terms clearly enough to be considered a science. In one example of this, a physicist named Alex Berezow (using a bunch of sciency terms that my poor psychologist brain struggled to understand) argued that happiness research is a perfect example of a failure to define terms. He states that "the meaning of the word differs from person to person and especially between cultures."(I have the least problems with this blog, wasn't necessary to chip in on physicist bashing though. Now, usually, the people who know little about psychology and argue this, are scientists who are used to define terms using formal language like mathematics. Or who postulate statements that are logically coherent, you must have heard of Hull's principles of behaviour? There have been and are enough psychologists who do so (e.g., mathematical psychology, ecological psychology). Simple example. Happiness could be something you want to describe as "takes on different meanings in different contexts (i.e., it can look differently in a different culture) but has some core universal content that is retained and we call this happiness". You could use a mathematical object that is transformation invariant, or symmetric with respect to the property or dimension of happiness. A position on the dimension itself would be someone's personal experience of happiness. Go from there to define operators and what not with which you can understand statistical regularities in happiness data. Don't say "can't" without trying first!)

I should have continued with two arguments I wanted to address, because two posts appeared that more or less used them:

5. "But science cannot be defined anyway, so what are you talking about?

I really like almost everything the author of this post writes, but on this argument I disagree completely. Judging from the commentaries, I am not the only one. Here's one that sums it up for me:
"Psychology continues to discredit itself. Perhaps better than even "is it true," would be to ask "can it manage to sustain a continued session of tests of its theories" or does it just continue to discredit itself."
(By the way, very glad to see many comments reflect some of the same problems I have with the defence strategy. Someone nicknamed Seriously? on that page is not me, but it could very well have been. Excellent comments!) 

6. "But we need funding, so look at all the stuff we are about to do correctly in the future")

This is an honest post about the difference between quantitative and qualitative science and I link to it because it mentions the motivation some people may have to use the phantom arguments I dismissed as nonsense. It also explains why people are viciously trying to draw away attention from the actual content of the criticism of the hard sciences (e.g., discrediting the messenger, playing the role of victim of a bully): Money, Funding of research. The author is correct to identify this motivation, I have seen the argument pop up in tweets posts, commentaries and what not. In fact I closed my previous post with Meehl's (1990) suggestion to save some taxpayer money:
"I am prepared to argue that a tremendous amount of taxpayer money goes down the drain in research that pseudotests theories in soft psychology and that it would be a material social advance as well as a reduction in what Lakatos has called “intellectual pollution” (Lakatos, 1970, fn. 1 on p. 176) if we would quit engaging in this feckless enterprise. "






(Sorry, it just came out like this, capitalised and all. 
You = Soft Psychology, I suppose)

So there you have it, instead of agreeing and publishing the criticism like Meehl did and devoting energy and resources to turn Psychology into a into a natural science, one that studies human nature it is suggested we play a game of politics and power. At this point I think it would be a blessing if Psychology was forced to prove to society it is worth funding by producing knowledge that results in reliable technology (accurate diagnosis of mental illness, efficient psychotherapy, resolution of international conflicts by diplomacy, eradicate bullying, fundamental knowledge about perception and action so we can finally have robots to take carte of us, etc.).

(And no, none of those examples in parentheses have already been achieved.
Don't tempt me.)


Cattell, J. (1893). On Errors of Observation. The American Journal of Psychology, 5(3), 285–293. Retrieved from

van Dongen, J. (2010). Einstein’s unification. Cambridge: Cambridge University Press.

Ellwood, C. (1899). Prolegomena to Social Psychology . I . The Need of the Study of Social Psychology. The American Journal of Sociology, 4(5), 656–665. Retrieved from

Good, I. J. (1965). The estimation of probabilities: An essay on modern Bayesian methods (Vol. 30). MIT press Cambridge, MA. 

Ladd, G. (1892). Psychology as So-Called “ Natural Science .” The Philosophical Review1(1), 24–53. Retrieved from

Mach, E. (1891). SOME QUESTIONS OF PSYCHO-PHYSICS. SENSATIONS AND THE ELEMENTS OF REALITY. The Monist, 1(3), 393–400. Retrieved from

Mach, E. (1892). Facts and mental symbols. The Monist, 2(2), 198–208. Retrieved from

Popper, K. R. (1959). The Propensity interpretation of probability. The British journal for the philosophy of science10(37), 25–42. Retrieved from

Titchener, E. (1893). Two Recent Criticisms of “Modern” Psychology. The Philosophical Review2(4), 450–458. Retrieved from