Friday, 5 July 2013

Representative samples, Generalisations, Populations and Property attribution

This is a reply to a reply to my reply to a post by +Daniel Simons 

What is described as examples of generalisation problems, are problems that occur when a sample is not representative of the population in which you expected to observe a phenomenon, or variable, or trait. In general, it is the phenomenon predicted by a theory or hypothesis that determines what the population is. That's because the population you sample from, is a theoretical construct, a mathematical model whose parameters you are going to estimate  (e.g., Fiedler & Juslin, 2006). The parameters define a probability distribution of values around the true value of the predicted population trait. Therefore, the population is always defined as the ensemble in which the true value of the trait can be observed as the outcomes of a measurement procedure.

(you remember I don't believe in true scores right?)

If the phenomenon of “racial bias” is operationalised in a specific culture, it is likely the racial bias theory used will tell a researcher to select stimuli capable of measuring bias in that culture. Still, one would generalise the results to a population in which racial bias is an observable phenomenon in principle. Unless of course this theory can only predict a very specific bias, in a very specific culture. I am not an expert, but cannot believe every culture needs its own personal racial bias theory.

The same way, if a researcher is only interested in discrimination between the colours red and green, then results do not generalise to a population that includes people with Deuteranopia, but that is also specified by the sentence: "capable of performing the task", which is the same as "a population in which the phenomenon may be observed.

If you use the red-green discrimination experiment as a measurement procedure in a sample that is not representative of a population in which the phenomenon may be observed (e.g., many participants with Deuteranopia), it would be wrong to conclude humans in general cannot discriminate between the two colours, but the cause is not the measurement procedure, the predictions, or the theory that prompted the predictions and measurement procedure. It’s a problem of representativity of the sample.

If one is truly concerned about representation, the proper thing to do in my opinion is a replication of the results in a representative sample, which is not as convenient as convenience sampling, but still much more convenient than building a Large Hadron Collider or sending satellites out into space in order to observe phenomena. Therefore, a smart journalist who would read a generalisation warning on a paper, about a very general human quality such as  emotion, attention, stress, anxiety or happiness influencing some decision, judgement, performance or overt behaviour, should ask: "But why didn’t you bother to measure your variables in a representative sample?"

(The point I actually wanted to make was)

If you operationalise the phenomenon colour discrimination as an object of measurement that is observable as a summary statistic at the level of the sample (a latent trait or variable), you actually predicted a population in which colour discrimination may be observed, from which you sample. You hope to have sampled enough of the population property to indeed observe it at the level of the sample. Our models do not allow (an easy way) back from predicted/inferred trait of the population, to the specific characteristics of an individual in that population, or: "anyone with normal colour perception" (e.g., Borsboom et al., (2003); Ellis et al, 1993).

That's when all the assumptions of our measurement theory and rules of inference apply, when we introduce emergence, violation of the ergodic principle, etc., property attribution based on measurement outcomes becomes even more problematic: Temperature exists as a property of ensembles of particles, but a single individual particle cannot be attributed the property "temperature". There is a theory though that links particle mechanics to ensemble dynamics (thermodynamics) which is called statistical mechanics. It also contains the word statistical, but the analogy should probably end there: The techniques for inductive inference we use in psychology, can only generalise to population parameters and they do so in a very straightforward way (theory of measurement error).

(Should I mention our model of nested scales of constraint and my position on inferring scaling phenomena? ...nah... some other time)

Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review110(2), 203–219. doi:10.1037/0033-295X.110.2.203
Ellis, J. L., & Wollenberg, A. L. (1993). Local homogeneity in latent trait models. A characterization of the homogeneous monotone irt model. Psychometrika58(3), 417–429. doi:10.1007/BF02294649
Fiedler, K., & Juslin, P. (2006). Information Sampling and Adaptive Cognition.

No comments:

Post a comment