Lykken (1968) estimated that the “unrelated” molar variables involved in most studies in psychology share 4-5% common variance, meaning, with 0 measurement error a correlation of about .20 can be expected between any one of them. This really depends on the field of inquiry, but it has been suggested that estimates between .15 and .35 are by no means an exaggeration.
The origins of such correlations are debated (and of course disputed), but I consider them as an example of the violation of the ergodic theorems for studying human behaviour and development (Molenaar & Campbell, 2009; Molenaar, 2008). The ergodic condition applies to systems whose current state in a state/phase space (that describes all the theoretically possible states a system could be in), is very weakly, or not at all influenced by its history, or its initial conditions. Hidden Markov models are an example of such systems. These systems have no "memory" for their initial state and formally this means their time averaged trajectories through phase space are about equal to their space averaged trajectories. Given enough time, they will visit all the regions of the phase space (formally there's a difference between phase and state space, which I will ignore here).
For Psychological Science the ergodic assumptions related to probability theory are important: In an ergodic system it does not matter if you measure a property of the system 100 times as a repeated measurement (time average), or, you measure the property of 100 ergodic systems at the same point in time (space average). The latter is of course the sample of participants from which inferences are drawn social science. The former would be repeated measurements within a single subject. In ergodic system, the averages of these different types of measurement would be the same. It does not matter for the expected averaged result whether you roll 1 die 100 times in a row, or 100 dice in 1 throw.
Trick or treat?
Now, the trick question is, do you think such is the case for psychological variables? Would I get the same developmental trajectory if I measured IQ in a single human being each year from 1 to 80 (assuming I have a 0-error, unbiased, IQ measuring instrument and a very long lifespan) as when I would draw a sample of 80 participants aged 1 through 80 and measured their IQ on one occasion. Very few scientists would predict I would obtain the same results, in both situations, but in social science we do act as if such would be the case. To me, any evidence of a system's future state being influenced by a state at a previous point in time (memory), is a violation of the ergodic condition and basically should indicate to a scientist to stop using central tendency measures and sampling theory to infer knowledge about the properties of this system. If you do not want to go that far, but still feel uncomfortable about my IQ example, you should probably accept that there may be some truth to Lykken's suggestion about a default ambient correlation between variables in social science. Simply put, if you walk like a duck, there is a small base expectancy that you will also talk like a duck.
Another line of evidence revealing that everything is correlated (over time), or "has memory", is of course ubiquitous fractal scaling in repeated measurements of human physiology and performance (e.g., Kello et al., 2010). If measurements are interdependent rather than independent it does not necessarily point to a violation of the ergodic condition, but combined, the two frameworks do predict very different measurement outcomes in certain contexts (e.g., Diniz, et al., 2011). My money is still on the "long memory" interpretation.
Based on the lower estimates of Lykken's correlation, the expected difference between any sample-based averages would be about 0.5 standard deviations. The test against a null hypothesis of “no association” is often a test against a “straw man” null hypothesis because it can be known in advance that an assumption of no association at all is false. Therefore, a researcher can maximize his chances to corroborate any weak prediction of association between variables, by making sure a large enough number of data points are collected. You know, those statistical power recommendations you have been hearing about for a while now. A genuine “crud factor” (cf. Meehl, 1990) implies a researcher has a chance of 1 in 4 to evidence an association using a sample size of 100 data points, without even needing a truth-like theory to predict an association or its sign.
|
Psychologists need to change the way they theorise about reality
The crud factor, or the violation of the ergodic condition, are not statistical errors that one can resolve by changing the way psychologists analyse their data. It requires adopting a different formalism about measurement of properties of non-ergodic systems, it requires theories that make different kinds of predictions. No worries, such theories already exist and there are social scientists who use them. To encourage others, here are some examples of what can happen if one continues to assume the ergodic condition is valid and use the prediction of signs of associations between variables (or group differences) as the ultimate epistemic tool for inferring scientific knowledge.
Suppose two variables x (e.g., a standardised reading ability test) and y (amount of music training received in childhood) were measured in samples drawn from a population that was cut into regions
in order to compare dyslexic readers (the 10th percentile and lower, and
the 25th and lower on variable x) and average readers (between the 25th
and 75th percentile on variable x) on variable y. The sample size for each group
was varied from 10 to 100 data points and 100 tests were performed for each group
size. For each test a new random group sample was drawn.
So, what's the use of a priori sample size calculations again? To get a sample size that will allow you to evidence just about anything you can(not) think of, as long as you limit your predictions to signs of associations (Figure 2). A real treat.
天
References
Figure 2. Same simulations as described in Figure 1, but for a range of crud factors between 0 and 0.4. |
References
Diniz, A., Wijnants, M.L., Torre, K., Barreiros, J., Crato, N., Bosman, A.M.T., Hasselman, F., Cox, R.F.A., Van Orden, G.C., & Delignières, D. (2011). Contemporary theories of 1/f noise in motor control. Human movement science, 30(5), 889–905. doi:10.1016/j.humov.2010.07.006
Kello, C. T., Brown, G. D. A., Ferrer-i-Cancho, R., Holden, J. G., Linkenkaer-Hansen, K., Rhodes, T., & Van Orden, G. C. (2010). Scaling laws in cognitive sciences. Trends in Cognitive Sciences, 14(5), 223–232.
Lykken, D. T. (1968). Statistical significance in psychological research. Psychological bulletin, 70(3), 151–9.
Meehl, P. E. (1990). Why Summaries of Research on Psychological Theories Are Often Uninterpretable. Psychological Reports, 66(1), 195. doi:10.2466/PR0.66.1.195-244
Molenaar, P. C. M. (2008). On the Implications of the Classical Ergodic Theorems : Analysis of Developmental Processes has to Focus on Intra-Individual Variation. Developmental Psychobiology, 50(1), 60–69. doi:10.1002/dev
Molenaar, P. C. M., & Campbell, C. G. (2009). The New Person-Specific Paradigm in Psychology. Current Directions in Psychological Science, 18(2), 112–117. doi:10.1111/j.1467-8721.2009.01619.x