Friday, 5 July 2013

Representative samples, Generalisations, Populations and Property attribution

This is a reply to a reply to my reply to a post by +Daniel Simons 

What is described as examples of generalisation problems, are problems that occur when a sample is not representative of the population in which you expected to observe a phenomenon, or variable, or trait. In general, it is the phenomenon predicted by a theory or hypothesis that determines what the population is. That's because the population you sample from, is a theoretical construct, a mathematical model whose parameters you are going to estimate  (e.g., Fiedler & Juslin, 2006). The parameters define a probability distribution of values around the true value of the predicted population trait. Therefore, the population is always defined as the ensemble in which the true value of the trait can be observed as the outcomes of a measurement procedure.

(you remember I don't believe in true scores right?)


If the phenomenon of “racial bias” is operationalised in a specific culture, it is likely the racial bias theory used will tell a researcher to select stimuli capable of measuring bias in that culture. Still, one would generalise the results to a population in which racial bias is an observable phenomenon in principle. Unless of course this theory can only predict a very specific bias, in a very specific culture. I am not an expert, but cannot believe every culture needs its own personal racial bias theory.

The same way, if a researcher is only interested in discrimination between the colours red and green, then results do not generalise to a population that includes people with Deuteranopia, but that is also specified by the sentence: "capable of performing the task", which is the same as "a population in which the phenomenon may be observed.

If you use the red-green discrimination experiment as a measurement procedure in a sample that is not representative of a population in which the phenomenon may be observed (e.g., many participants with Deuteranopia), it would be wrong to conclude humans in general cannot discriminate between the two colours, but the cause is not the measurement procedure, the predictions, or the theory that prompted the predictions and measurement procedure. It’s a problem of representativity of the sample.

If one is truly concerned about representation, the proper thing to do in my opinion is a replication of the results in a representative sample, which is not as convenient as convenience sampling, but still much more convenient than building a Large Hadron Collider or sending satellites out into space in order to observe phenomena. Therefore, a smart journalist who would read a generalisation warning on a paper, about a very general human quality such as  emotion, attention, stress, anxiety or happiness influencing some decision, judgement, performance or overt behaviour, should ask: "But why didn’t you bother to measure your variables in a representative sample?"

(The point I actually wanted to make was)

If you operationalise the phenomenon colour discrimination as an object of measurement that is observable as a summary statistic at the level of the sample (a latent trait or variable), you actually predicted a population in which colour discrimination may be observed, from which you sample. You hope to have sampled enough of the population property to indeed observe it at the level of the sample. Our models do not allow (an easy way) back from predicted/inferred trait of the population, to the specific characteristics of an individual in that population, or: "anyone with normal colour perception" (e.g., Borsboom et al., (2003); Ellis et al, 1993).

That's when all the assumptions of our measurement theory and rules of inference apply, when we introduce emergence, violation of the ergodic principle, etc., property attribution based on measurement outcomes becomes even more problematic: Temperature exists as a property of ensembles of particles, but a single individual particle cannot be attributed the property "temperature". There is a theory though that links particle mechanics to ensemble dynamics (thermodynamics) which is called statistical mechanics. It also contains the word statistical, but the analogy should probably end there: The techniques for inductive inference we use in psychology, can only generalise to population parameters and they do so in a very straightforward way (theory of measurement error).

(Should I mention our model of nested scales of constraint and my position on inferring scaling phenomena? ...nah... some other time)

Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review110(2), 203–219. doi:10.1037/0033-295X.110.2.203
Ellis, J. L., & Wollenberg, A. L. (1993). Local homogeneity in latent trait models. A characterization of the homogeneous monotone irt model. Psychometrika58(3), 417–429. doi:10.1007/BF02294649
Fiedler, K., & Juslin, P. (2006). Information Sampling and Adaptive Cognition.

Wednesday, 3 July 2013

Respect your elders: Fads, fashions, and folderol in psychology - Dunnette (1966)


Some reflections on novelty in psychological science

In the discussion on open data I commented on recently results were reported on data sharing:
Because the authors were writing in APA journals and PLoS One, respectively, they had agreed at the time of submitting that they would share their data according to the journals' policies. But only 26 % and 10 %, respectively, did. (I got the references from a paper by Peter Götzsche, there may be others of which I am unaware.
Yes, there are other studies, interestingly, in the historical record: plus ça change, plus c'est la même chose.

To stress the importance of efforts to change these statistics, an excerpt from Dunnette (1966) who reports a 1962 study found 13.5% authors complied to data requests. Reasons for being unable to comply with a request sound familiar, this is not an issue of "modern" science it seems. (I can recommend the entire article)

THE SECRETS WE KEEP  
We might better label this game "Dear God,  Please Don't Tell Anyone." As the name implies, it incorporates all the things we do to accomplish the aim of looking better in public than we really are. The most common variant is, of course, the tendency to bury negative results.  
I only recently became aware of the massive size of this great graveyard for dead studies when a colleague ex- pressed gratification that only a third of his studies "turned out"—as he put it. 
Recently, a second variant of this secrecy game was discovered, quite inadvertently, by Wolins (1962) when he wrote to 37 authors to ask. for the raw data on which they had based recent journal articles. 
Wolins found that of 32 who replied, 21 reported their data to be either misplaced, lost, or inadvertently destroyed. Finally, after some negotiation, Wolins was able to complete seven re-analyses on the data supplied from 5 authors. 
Of the seven, he found gross errors in three—errors so great as to clearly change the outcome of the results already reported. Thus, if we are to accept these results from Wolins' sampling, we might expect that as many as one-third of the studies in our journals contain gross miscalculations."

30% gross miscalculations might have been a high estimate, but as a 50 year prospective prediction it's not bad: Bakker & Wicherts (2011) found "number of articles with gross errors" across 3 high and 3 low impact journals ranging from 9% to 27.6% 

In the light of these (and other) historical facts & figures, maybe its time for a historical study, lots of recommendations in those publications. 


Again Dunnette (1966):

THE CAUSES
[…]
When viewed against the backdrop of publication pressures prevailing in academia, the lure of large-scale support from Federal agencies, and the presumed necessity to become "visible" among one's colleagues, the insecurities of undertaking research on important questions in possibly untapped and unfamiliar areas become even more apparent. 
THE REMEDY 
[…]
1. Give up constraining commitments to theories, methods, and apparatus!
2. Adopt methods of multiple working hypotheses!
3. Put more eclecticism into graduate education!
4. Press for new values and less pre-tense in the academic environments of our universities!
5. Get to the editors of our psychological journals! 
THE OUTCOME: UTOPIA  
How do I envision the eventual outcome if all these recommendations were to come to pass? What would the psychologizing of the future look like and what would psychologists be up to? Chief among the outcomes, I expect, would be a marked lessening of tensions and disputes among the Great Men of our field.
I would hope that we might once again witness the emergence of an honest community of scholars all engaged in the zestful enterprise of trying to describe, understand, predict, and control human behavior.



References

Bakker, M., & Wicherts, J. M. (2011). The (mis)reporting of statistical results in psychology journals. Behavior research methods43(3), 666–78. doi:10.3758/s13428-011-0089-5
Dunnette, M. D. (1966). Fads, fashions, and folderol in psychology. The American psychologist21(4), 343–52. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/5910065
Wolins, L. (1962). Responsibility for raw data. The American Psychologist, 17, 657-658. doi: 10.1037/h0038819