A Structural Realist's Guide to the Universe: July 2013

Sunday, 7 July 2013

Respect your elders: First, you watch Meehl's videotaped philosophy of psychology lectures - then we'll discuss your "pseudo-intellectual bunch of nothing"

I've never understood how it was possible a reviewer or editor of a scientific journal could write something like: This subject matter is too difficult and complex for our reader audience (to be interested in). I even heard a colleague exclaim once that a mathematical ps. journal found his mathematics too complex.

That can only mean the audience does not want to get educated on things they do not know about yet, which is strange behaviour for scientists. An editor should invite an author to write a primer possibly as supplementary material, I've seen some examples of that recently, the p-curve article to appear in JEP: General is one of them.

More often than not, psychological theories and their predictions are evaluated for their descriptive value, which means: can the reviewer relate what it is about to his own preferred theories. This should not matter in science, theories should (as long as they do not claim a re-write of well-established theories based on some statistical oddities: Bem, 2011), be evaluated for the precision of the predictions they make, their empirical accuracy and their logical structure.

Problem is, we do not get educated on these matters in psychology. Whether you do or not seems to trivially depend on whether there's a professor at your university who knows about theses things.

(How lucky they were in Minnesota!)

It's plain and simple: If we really want psychology to be taken seriously as a scientific endeavour, we need to discuss it at the level of metatheory, how do we evaluate theories, what is their verisimilitude, their similarities so we can hope to unify them.

We need to discuss it at the level Paul Meehl discussed it.

Now, his list of publications is long, so are the publications themselves and my list of quotes I would like to paste here is endless and going by the popular journals, our generation of scientists is likely to doze off by anything longer than 5000 words anyway.

How about some video then?

Meehl's Philosophical Psychology Seminar is available online as video!

12 lectures of about 1.5 hours and you'll know all you need to know to have a proper discussion about the credibility of the theory you use to study the phenomena you are interested in.

(You do know TED last only 20 minutes or so?),

Ok, get through the first 7 at least (this will not be a difficult task, I even enjoyed to hear him speak about the practicalities of the course)

The videos are here: http://www.psych.umn.edu/meehlvideos.php

Recommendations of Meehl's work by others:

"After reading Meehl (1967) [and other psychologists] one wonders whether the function of statistical techniques in the social sciences is not primarily to provide a machinery for producing phony corroborations and thereby a semblance of ‘scientific progress’ where, in fact, there is nothing but an increase in pseudo-intellectual garbage." (Lakatos, 1978, pp. 88–9)

Just one quote sums it up for me

Whenever I try to evaluate what someone is claiming about the world based on their data, or "theory" from the perspective of theory-evaluation, they look at me like a dog who has just been shown a card trick. It's so unreal that I cannot use a word like ontology or epistemology or ask about the measurement theory or rules of inference someone used to make a claim about the way the universe works that I considered leaving academia, but I guess leaving without trying to change the world is not how I was raised, or genetically determined, The quote below summarises how I feel almost exactly:

"I am prepared to argue that a tremendous amount of taxpayer money goes down the drain in research that pseudotests theories in soft psychology and that it would be a material social advance as well as a reduction in what Lakatos has called “intellectual pollution” (Lakatos, 1970, fn. 1 on p. 176) if we would quit engaging in this feckless enterprise.

I think that if psychologists would face up to the full impact of the above criticisms, something worthwhile would have been achieved in convincing them of it. Besides, before one can motivate many competent people to improve an unsatisfactory cognitive situation by some judicious mixture of more powerful testing strategies and criteria for setting aside complex substantive theory as “not presently testable,” it is necessary to face the fact that the present state of affairs is unsatisfactory.

My experience has been that most graduate students, and many professors, engage in a mix of defense mechanisms (most predominantly, denial), so that they can proceed as they have in the past with a good scientific conscience. The usual response is to say, in effect, “Well, that Meehl is a clever fellow and he likes to philosophize, fine for him, it’s a free country. But since we are doing all right with the good old tried and true methods of Fisherian statistics and null hypothesis testing, and since journal editors do not seem to have panicked over such thoughts, I will stick to the accepted practices of my trade union and leave Meehl’s worries to the statisticians and philosophers.”

I cannot strongly fault a 45-year-old professor for adopting this mode of defense, even though I believe it to be intellectually dishonest, because I think that for most faculty in soft psychology the full acceptance of my line of thought would involve a painful realization that one has achieved some notoriety, tenure, economic security and the like by engaging, to speak bluntly, in a bunch of nothing." (Meehl, 1990, emphasis and markup added)

References

Meehl, P. E. (1990). Why Summaries of Research on Psychological Theories Are Often Uninterpretable. Psychological Reports, 66(1), 195. doi:10.2466/PR0.66.1.195-244

天

Friday, 5 July 2013

Representative samples, Generalisations, Populations and Property attribution

This is a reply to a reply to my reply to a post by +Daniel Simons

What is described as examples of generalisation problems, are problems that occur when a sample is not representative of the population in which you expected to observe a phenomenon, or variable, or trait. In general, it is the phenomenon predicted by a theory or hypothesis that determines what the population is. That's because the population you sample from, is a theoretical construct, a mathematical model whose parameters you are going to estimate (e.g., Fiedler & Juslin, 2006). The parameters define a probability distribution of values around the true value of the predicted population trait. Therefore, the population is always defined as the ensemble in which the true value of the trait can be observed as the outcomes of a measurement procedure.

(you remember I don't believe in true scores right?)

If the phenomenon of “racial bias” is operationalised in a specific culture, it is likely the racial bias theory used will tell a researcher to select stimuli capable of measuring bias in that culture. Still, one would generalise the results to a population in which racial bias is an observable phenomenon in principle. Unless of course this theory can only predict a very specific bias, in a very specific culture. I am not an expert, but cannot believe every culture needs its own personal racial bias theory.

The same way, if a researcher is only interested in discrimination between the colours red and green, then results do not generalise to a population that includes people with Deuteranopia, but that is also specified by the sentence: "capable of performing the task", which is the same as "a population in which the phenomenon may be observed."

If you use the red-green discrimination experiment as a measurement procedure in a sample that is not representative of a population in which the phenomenon may be observed (e.g., many participants with Deuteranopia), it would be wrong to conclude humans in general cannot discriminate between the two colours, but the cause is not the measurement procedure, the predictions, or the theory that prompted the predictions and measurement procedure. It’s a problem of representativity of the sample.

If one is truly concerned about representation, the proper thing to do in my opinion is a replication of the results in a representative sample, which is not as convenient as convenience sampling, but still much more convenient than building a Large Hadron Collider or sending satellites out into space in order to observe phenomena. Therefore, a smart journalist who would read a generalisation warning on a paper, about a very general human quality such as emotion, attention, stress, anxiety or happiness influencing some decision, judgement, performance or overt behaviour, should ask: "But why didn’t you bother to measure your variables in a representative sample?"

(The point I actually wanted to make was)

If you operationalise the phenomenon colour discrimination as an object of measurement that is observable as a summary statistic at the level of the sample (a latent trait or variable), you actually predicted a population in which colour discrimination may be observed, from which you sample. You hope to have sampled enough of the population property to indeed observe it at the level of the sample. Our models do not allow (an easy way) back from predicted/inferred trait of the population, to the specific characteristics of an individual in that population, or: "anyone with normal colour perception" (e.g., Borsboom et al., (2003); Ellis et al, 1993).

That's when all the assumptions of our measurement theory and rules of inference apply, when we introduce emergence, violation of the ergodic principle, etc., property attribution based on measurement outcomes becomes even more problematic: Temperature exists as a property of ensembles of particles, but a single individual particle cannot be attributed the property "temperature". There is a theory though that links particle mechanics to ensemble dynamics (thermodynamics) which is called statistical mechanics. It also contains the word statistical, but the analogy should probably end there: The techniques for inductive inference we use in psychology, can only generalise to population parameters and they do so in a very straightforward way (theory of measurement error).

(Should I mention our model of nested scales of constraint and my position on inferring scaling phenomena? ...nah... some other time)

Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110(2), 203–219. doi:10.1037/0033-295X.110.2.203

Ellis, J. L., & Wollenberg, A. L. (1993). Local homogeneity in latent trait models. A characterization of the homogeneous monotone irt model. Psychometrika, 58(3), 417–429. doi:10.1007/BF02294649

Fiedler, K., & Juslin, P. (2006). Information Sampling and Adaptive Cognition.

天

Wednesday, 3 July 2013

Respect your elders: Fads, fashions, and folderol in psychology - Dunnette (1966)

Some reflections on novelty in psychological science

In the discussion on open data I commented on recently results were reported on data sharing:

Because the authors were writing in APA journals and PLoS One, respectively, they had agreed at the time of submitting that they would share their data according to the journals' policies. But only 26 % and 10 %, respectively, did. (I got the references from a paper by Peter Götzsche, there may be others of which I am unaware.

Yes, there are other studies, interestingly, in the historical record: plus ça change, plus c'est la même chose.

To stress the importance of efforts to change these statistics, an excerpt from Dunnette (1966) who reports a 1962 study found 13.5% authors complied to data requests. Reasons for being unable to comply with a request sound familiar, this is not an issue of "modern" science it seems. (I can recommend the entire article)

THE SECRETS WE KEEP

We might better label this game "Dear God, Please Don't Tell Anyone." As the name implies, it incorporates all the things we do to accomplish the aim of looking better in public than we really are. The most common variant is, of course, the tendency to bury negative results.

I only recently became aware of the massive size of this great graveyard for dead studies when a colleague ex- pressed gratification that only a third of his studies "turned out"—as he put it.

Recently, a second variant of this secrecy game was discovered, quite inadvertently, by Wolins (1962) when he wrote to 37 authors to ask. for the raw data on which they had based recent journal articles.

Wolins found that of 32 who replied, 21 reported their data to be either misplaced, lost, or inadvertently destroyed. Finally, after some negotiation, Wolins was able to complete seven re-analyses on the data supplied from 5 authors.

Of the seven, he found gross errors in three—errors so great as to clearly change the outcome of the results already reported. Thus, if we are to accept these results from Wolins' sampling, we might expect that as many as one-third of the studies in our journals contain gross miscalculations."

30% gross miscalculations might have been a high estimate, but as a 50 year prospective prediction it's not bad: Bakker & Wicherts (2011) found "number of articles with gross errors" across 3 high and 3 low impact journals ranging from 9% to 27.6%

In the light of these (and other) historical facts & figures, maybe its time for a historical study, lots of recommendations in those publications.

Again Dunnette (1966):

THE CAUSES
[…]
When viewed against the backdrop of publication pressures prevailing in academia, the lure of large-scale support from Federal agencies, and the presumed necessity to become "visible" among one's colleagues, the insecurities of undertaking research on important questions in possibly untapped and unfamiliar areas become even more apparent.

THE REMEDY

[…]
1. Give up constraining commitments to theories, methods, and apparatus!
2. Adopt methods of multiple working hypotheses!
3. Put more eclecticism into graduate education!
4. Press for new values and less pre-tense in the academic environments of our universities!
5. Get to the editors of our psychological journals!

THE OUTCOME: UTOPIA

How do I envision the eventual outcome if all these recommendations were to come to pass? What would the psychologizing of the future look like and what would psychologists be up to? Chief among the outcomes, I expect, would be a marked lessening of tensions and disputes among the Great Men of our field.
I would hope that we might once again witness the emergence of an honest community of scholars all engaged in the zestful enterprise of trying to describe, understand, predict, and control human behavior.

Did I already do my déjà vu joke?

References

Bakker, M., & Wicherts, J. M. (2011). The (mis)reporting of statistical results in psychology journals. Behavior research methods, 43(3), 666–78. doi:10.3758/s13428-011-0089-5

Dunnette, M. D. (1966). Fads, fashions, and folderol in psychology. The American psychologist, 21(4), 343–52. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/5910065

Wolins, L. (1962). Responsibility for raw data. The American Psychologist, 17, 657-658. doi: 10.1037/h0038819

天

Pages