Sunday, 6 October 2013

Respect your elders: Lykken's (1968) correlated ambient noise: Do fractal scaling and violations of the ergodic condition evidence the crud factor?

Lykken (1968) estimated that the “unrelated” molar variables involved in most studies in psychology share 4-5% common variance, meaning, with 0 measurement error a correlation of about .20 can be expected between any one of them. This really depends on the field of inquiry, but it has been suggested that estimates between .15 and .35 are by no means an exaggeration.

The origins of such correlations are debated (and of course disputed), but I consider them as an example of the violation of the ergodic theorems for studying human behaviour and development (Molenaar & Campbell, 2009; Molenaar, 2008). The ergodic condition applies to systems whose current state in a state/phase space (that describes all the theoretically possible states a system could be in), is very weakly, or not at all influenced by its history, or its initial conditions. Hidden Markov models are an example of such systems. These systems have no "memory" for their initial state and formally this means their time averaged trajectories through phase space are about equal to their space averaged trajectories. Given enough time, they will visit all the regions of the phase space (formally there's a difference between phase and state space, which I will ignore here).

For Psychological Science the ergodic assumptions related to probability theory are important: In an ergodic system it does not matter if you measure a property of the system 100 times as a repeated measurement (time average), or, you measure the property of 100 ergodic systems at the same point in time (space average). The latter is of course the sample of participants from which inferences are drawn social science. The former would be repeated measurements within a single subject. In ergodic system, the averages of these different types of measurement would be the same. It does not matter for the expected averaged result whether you roll 1 die 100 times in a row, or 100 dice in 1 throw.

Trick or treat?

Now, the trick question is, do you think such is the case for psychological variables? Would I get the same developmental trajectory if I measured IQ in a single human being each year from 1 to 80 (assuming I have a 0-error, unbiased, IQ measuring instrument and a very long lifespan) as when I would draw a sample of 80 participants aged 1 through 80 and measured their IQ on one occasion. Very few scientists would predict I would obtain the same results, in both situations, but in social science we do act as if such would be the case. To me, any evidence of a system's future state being influenced by a state at a previous point in time (memory), is a violation of the ergodic condition and basically should indicate to a scientist to stop using central tendency measures and sampling theory to infer knowledge about the properties of this system. If you do not want to go that far, but still feel uncomfortable about my IQ example, you should probably accept that there may be some truth to Lykken's suggestion about a default ambient correlation between variables in social science. Simply put, if you walk like a duck, there is a small base expectancy that you will also talk like a duck. 

Another line of evidence revealing that everything is correlated (over time), or "has memory", is of course ubiquitous fractal scaling in repeated measurements of human physiology and performance (e.g., Kello et al., 2010). If measurements are interdependent rather than independent it does not necessarily point to a violation of the ergodic condition, but combined, the two frameworks do predict very different measurement outcomes in certain contexts (e.g., Diniz, et al., 2011). My money is still on the "long memory" interpretation

Based on the lower estimates of Lykken's correlation, the expected difference between any sample-based averages would be about 0.5 standard deviations. The test against a null hypothesis of “no association” is often a test against a “straw man” null hypothesis because it can be known in advance that an assumption of no association at all is false. Therefore, a researcher can maximize his chances to corroborate any weak prediction of association between variables, by making sure a large enough number of data points are collected. You know, those statistical power recommendations you have been hearing about for a while now. A genuine “crud factor” (cf. Meehl, 1990) implies a researcher has a chance of 1 in 4 to evidence an association using a sample size of 100 data points, without even needing a truth-like theory to predict an association or its sign.

Figure 1. A simulation of the effect of sampling from different regions of a population distribution (Npop = 500000) in the presence of a crud factor, a population-level correlation between any two random variables. Each dot represents the number of significant results (p < .05) observed in 100 t-tests for independent groups of the size represented on the x-axis (10 – 100). Two random variables were generated for each population correlation: .1, .2, .3 (columns). One random variable was used to sample data points in the 10th (top row) or 25th (bottom row), and between the 25th and 75th percentile (comparison group). The means concern the aggregated values of the second random variable for each sampled case. The directional hypothesis tested against the null was (M[.25,.75] – M[0,.10]) > 0  or (M[.25,.75] – M[0,.25]) > 0 .

Psychologists need to change the way they theorise about reality

The crud factor, or the violation of the ergodic condition, are not statistical errors that one can resolve by changing the way psychologists analyse their data. It requires adopting a different formalism about measurement of properties of non-ergodic systems, it requires theories that make different kinds of predictions. No worries, such theories already exist and there are social scientists who use them. To encourage others, here are some examples of what can happen if one continues to assume the ergodic condition is valid and use the prediction of signs of associations between variables (or group differences) as the ultimate epistemic tool for inferring scientific knowledge.

Suppose two variables x (e.g., a standardised reading ability test) and y (amount of music training received in childhood) were measured in samples drawn from a population that was cut into regions in order to compare dyslexic readers (the 10th percentile and lower, and the 25th and lower on variable x) and average readers (between the 25th and 75th percentile on variable x) on variable y. The sample size for each group was varied from 10 to 100 data points and 100 tests were performed for each group size. For each test a new random group sample was drawn.

Figure 1 represents the number of significant (p < .05) t tests found in the series of 100 tests conducted for each group size. If the crud factor were .1, then comparing to the samples from the 10th and 25th percentile would yield 25% significant results at group sizes of 44 and 58 data points respectively. The total study sample size would be 88 and 116. At this crud factor level the chances do not get much better than 1 in 4 corroborative events without there being any theory to pat on the back and grant some verisimilitude. When the correlation is .2, 25% significant tests can be expected at group sizes of 12 (10th) and 23 (25th) and at a correlation of .3 it’s 10 (10th) and 12 (25th) participants in each group to find 25% significant differences. The crud factor of .3 even implies that 100% of the conducted tests could give a significant result if the group size is larger than 87 and the dyslexic group is drawn from the 10th percentile of the population distribution of reading ability.

So, what's the use of a priori sample size calculations again? To get a sample size that will allow you to evidence just about anything you can(not) think of, as long as you limit your predictions to signs of associations (Figure 2). A real treat.

Figure 2. Same simulations as described in Figure 1, but for a range of crud factors between 0 and 0.4.


Diniz, A., Wijnants, M.L., Torre, K., Barreiros, J., Crato, N., Bosman, A.M.T., Hasselman, F., Cox, R.F.A., Van Orden, G.C., & Delignières, D. (2011). Contemporary theories of 1/f noise in motor control. Human movement science, 30(5), 889–905. doi:10.1016/j.humov.2010.07.006

Kello, C. T., Brown, G. D. A., Ferrer-i-Cancho, R., Holden, J. G., Linkenkaer-Hansen, K., Rhodes, T., & Van Orden, G. C. (2010). Scaling laws in cognitive sciences. Trends in Cognitive Sciences, 14(5), 223–232. 

Lykken, D. T. (1968). Statistical significance in psychological research. Psychological bulletin, 70(3), 151–9.
Meehl, P. E. (1990). Why Summaries of Research on Psychological Theories Are Often Uninterpretable. Psychological Reports, 66(1), 195. doi:10.2466/PR0.66.1.195-244

Molenaar, P. C. M. (2008). On the Implications of the Classical Ergodic Theorems : Analysis of Developmental Processes has to Focus on Intra-Individual Variation. Developmental Psychobiology, 50(1), 60–69. doi:10.1002/dev

Molenaar, P. C. M., & Campbell, C. G. (2009). The New Person-Specific Paradigm in Psychology. Current Directions in Psychological Science, 18(2), 112–117. doi:10.1111/j.1467-8721.2009.01619.x