The science journalists at the Volkskrant rarely miss an opportunity to engage in psychology bashing. The recent failure of seventeen laboratories to replicate a study by Fritz Strack and colleagues (1988) was therefore too good to miss.

The study was a test of the facial feedback hypothesis. This hypothesis, which was originally suggested by Darwin, implies that certain facial expressions and postures exert an influence on emotional responses. Research on this theory had originally been hampered by the difficulty of manipulating facial expressions unobtrusively. The innovative contribution of Strack and colleagues was to suggest a method by which frowns and smiles could be induced, without having to give specific instructions about facial muscles. Holding a pen with their teeth facilitates smiles, whereas holding it with jutted lips inhibits smiles. In support of the facial feedback theory, subjects reported more intense humor responses when cartoons were presented under conditions that facilitated smiling rather than inhibiting it.

But for readers of the September 7 issue of the Volkskrant, who were not informed about the innovativeness of the experimental manipulation, a study in which research participants rate the funniness of cartoons, while holding a pen either between their lips or between their teeth seems indeed ridiculous and justifying the hilarious title: 'Alweer zo’n psychologisch lachertje' even though it would have been more fitting for a tabloid than a paper that perceives itself as serious. (The title of the online publication differs from the one in the paper of September 7, click here for that article in newsservice Blendle, ed.) 

By further describing the original finding as the key support for the facial feedback hypothesis, the replication failure could be made to look devastating for psychology, pulling the rug from under yet another psychological theory.

Finally, by concluding that the original findings were “nep” (according to my dictionary “bogus”, “fake”, “swindle”) and by justifying the need for replications of key experiments as at least partly due to the multiple frauds of Stapel, the report managed to intimate that the original finding might have been fraudulent.

As I argued in “Scientific misconduct and the myth of self-correction in science” (2012; coauthored with Postmes and Spears; see also Stroebe & Strack, 2013), replication is a poor fraud detector. There are always multiple reasons for the failure to replicate a particular research finding so that even multiple failures to replicate do not indicate fraud. Similarly, successful replication is no indication of the absence of fraud. In order to get their fraudulent articles published, fraudsters have to develop plausible scientific hypotheses.

The difference between fraudsters and scientists is that scientists test their hypotheses empirically, whereas fraudsters support them with invented data. As there is a good chance that a scientifically plausible hypothesis would also be supported by empirical data, a careful fraudster will probably score as many hits as an integer scientist. Thus, even NWO would be unlikely to fund replication studies in order to detect fraud.

There is also no urgent need for another test of the facial feedback hypothesis, as this hypothesis is well-supported by a multitude of psychological and neurological data: The manipulation of facial expressions has been shown to moderate emotional responses as well as prefrontal activation and amygdala activation (Price & Harmon Jones, 2015). Although the Strack et al. experiment, as one of the early studies supporting the hypothesis, is mainly of historical importance today, their method of manipulating emotional expressions has been shown to moderate emotional responses in several subsequent experiments, albeit none using funniness ratings of cartoons as dependent variable. A recent study demonstrated even that the two ways of holding a pen were associated with different fMRI fluctuations in areas related to the initiation of positive emotions (Chang et al., 2013; see also Hennenlotter et al., 2009).

So why did the replication fail? There are two potential reasons: First, ratings of the funniness of cartoons (or of these specific cartoons) might not be a sensitive measure for the changes in the emotional response triggered by the manipulation of facial feedback. This seems hardly a sufficiently momentous insight to justify the time and resources 17 laboratories invested to replicate that study.

Second, the replication failure could have been due to changes to the experimental design made by the replicators. There is indeed one striking difference between the original and the replication study: Subjects in the replication (not in the original) were videotaped during the experiment. I could imagine that they might have felt a bit strange sitting in a laboratory with a pen in their mouth rating cartoons. Most participants will probably have been concerned about looking rather silly. It is easily imaginable that these concerns will have drowned the subtle cues emanating from their facial muscles. The fact that video cameras (or mirrors) induce a self-focus has long been known to social psychologists (but apparently not to statisticians). Further research will have to show whether this was a critical moderator.

If the Volkskrant had bothered to talk to Strack, or at least to peruse the long list of publications on facial feedback listed by Strack in his comment to the replication article, they would have had all the information I have just presented, including the potential influence of the videotaping on participants. However, they would have missed the opportunity of reporting “Alweer zo’n psychologisch lachertje”.

