We discuss three arguments voiced by scientists who view the current outpouring of concern about replicability as overblown. The first idea is that the adoption of a low alpha level (e.g., 5%) puts ...reasonable bounds on the rate at which errors can enter the published literature, making false-positive effects rare enough to be considered a minor issue. This, we point out, rests on statistical misunderstanding: The alpha level imposes no limit on the rate at which errors may arise in the literature (loannidis, 2005b). Second, some argue that whereas direct replication attempts are uncommon, conceptual replication attempts are common—providing an even better test of the validity of a phenomenon. We contend that performing conceptual rather than direct replication attempts interacts insidiously with publication bias, opening the door to literatures that appear to confirm the reality of phenomena that in fact do not exist. Finally, we discuss the argument that errors will eventually be pruned out of the literature if the field would just show a bit of patience. We contend that there are no plausible concrete scenarios to back up such forecasts and that what is needed is not patience, but rather systematic reforms in scientific practice.
Functional magnetic resonance imaging (fMRI) studies of emotion, personality, and social cognition have drawn much attention in recent years, with high-profile studies frequently reporting extremely ...high (e.g., >.8) correlations between brain activation and personality measures. We show that these correlations are higher than should be expected given the (evidently limited) reliability of both fMRI and personality measures. The high correlations are all the more puzzling because method sections rarely contain much detail about how the correlations were obtained. We surveyed authors of 55 articles that reported findings of this kind to determine a few details on how these correlations were computed. More than half acknowledged using a strategy that computes separate correlations for individual voxels and reports means of only those voxels exceeding chosen thresholds. We show how this nonindependent analysis inflates correlations while yielding reassuring-looking scattergrams. This analysis technique was used to obtain the vast majority of the implausibly high correlations in our survey sample. In addition, we argue that, in some cases, other analysis problems likely created entirely spurious correlations. We outline how the data from these studies could be reanalyzed with unbiased methods to provide accurate estimates of the correlations in question and urge authors to perform such reanalyses. The underlying problems described here appear to be common in fMRI research of many kinds--not just in studies of emotion, personality, and social cognition.
Human memory is imperfect; thus, periodic review is required for the long-term preservation of knowledge and skills. However, students at every educational level are challenged by an ever-growing ...amount of material to review and an ongoing imperative to master new material. We developed a method for efficient, systematic, personalized review that combines statistical techniques for inferring individual differences with a psychological theory of memory. The method was integrated into a semester-long middle-school foreign-language course via retrieval-practice software. Using a cumulative exam administered after the semester's end, we compared time-matched review strategies and found that personalized review yielded a 16.5% boost in course retention over current educational practice (massed study) and a 10.0% improvement over a one-size-fits-all strategy for spaced study.
Bargh et al. (2001) reported two experiments in which people were exposed to words related to achievement (e.g., strive, attain) or to neutral words, and then performed a demanding cognitive task. ...Performance on the task was enhanced after exposure to the achievement related words. Bargh and colleagues concluded that better performance was due to the achievement words having activated a "high-performance goal". Because the paper has been cited well over 1100 times, an attempt to replicate its findings would seem warranted. Two direct replication attempts were performed. Results from the first experiment (n = 98) found no effect of priming, and the means were in the opposite direction from those reported by Bargh and colleagues. The second experiment followed up on the observation by Bargh et al. (2001) that high-performance-goal priming was enhanced by a 5-minute delay between priming and test. Adding such a delay, we still found no evidence for high-performance-goal priming (n = 66). These failures to replicate, along with other recent results, suggest that the literature on goal priming requires some skeptical scrutiny.
It is often assumed that implicit learning of skills based on predictive relationships proceeds independently of awareness. To test this idea, four groups of subjects played a game in which a ...fast-moving "demon" made a brief appearance at the bottom of the computer screen, then disappeared behind a V-shaped occluder, and finally re-appeared briefly on either the upper-left or upper-right quadrant of the screen. Points were scored by clicking on the demon during the final reappearance phase. Demons differed in several visible characteristics including color, horn height and eye size. For some subjects, horn height perfectly predicted which side the demon would reappear on. For subjects not told the rule, the subset who demonstrated at the end of the experiment that they had spontaneously discovered the rule showed strong evidence of exploiting it by anticipating the demon's arrival and laying in wait for it. Those who could not verbalize the rule performed no better than a control group for whom the demons moved unpredictably. The implications of this tight linkage between conscious awareness and implicit skill learning are discussed.
The authors performed a meta-analysis of the distributed practice effect to illuminate the effects of temporal variables that have been neglected in previous reviews. This review found 839 ...assessments of distributed practice in 317 experiments located in 184 articles. Effects of spacing (consecutive massed presentations vs. spaced learning episodes) and lag (less spaced vs. more spaced learning episodes) were examined, as were expanding interstudy interval (ISI) effects. Analyses suggest that ISI and retention interval operate jointly to affect final-test retention; specifically, the ISI producing maximal retention increased as retention interval increased. Areas needing future research and theoretical implications are discussed.
People often have trouble performing 2 relatively simple tasks concurrently. The causes of this interference and its implications for the nature of attentional limitations have been controversial for ...40 years, but recent experimental findings are beginning to provide some answers. Studies of the psychological refractory period effect indicate a stubborn bottleneck encompassing the process of choosing actions and probably memory retrieval generally, together with certain other cognitive operations. Other limitations associated with task preparation, sensory-perceptual processes, and timing can generate additional and distinct forms of interference. These conclusions challenge widely accepted ideas about attentional resources and probe reaction time methodologies. They also suggest new ways of thinking about continuous dual-task performance, effects of extraneous stimulation (e.g., stop signals), and automaticity. Implications for higher mental processes are discussed.
Williams and Bargh (2008) reported an experiment in which participants were simply asked to plot a single pair of points on a piece of graph paper, with the coordinates provided by the experimenter ...specifying a pair of points that lay at one of three different distances (close, intermediate, or far, relative to the range available on the graph paper). The participants who had graphed a more distant pair reported themselves as being significantly less close to members of their own family than did those who had plotted a more closely-situated pair. In another experiment, people's estimates of the caloric content of different foods were reportedly altered by the same type of spatial distance priming. Direct replications of both results were attempted, with precautions to ensure that the experimenter did not know what condition the participant was assigned to. The results showed no hint of the priming effects reported by Williams and Bargh (2008).
Every day, students and instructors are faced with the decision of when to study information. The timing of study, and how it affects memory retention, has been explored for many years in research on ...human learning. This research has shown that performance on final tests of learning is improved if multiple study sessions are separated—i.e., "spaced" apart—in time rather than massed in immediate succession. In this article, we review research findings of the types of learning that benefit from spaced study, demonstrations of these benefits in educational settings, and recent research on the time intervals during which spaced study should occur in order to maximize memory retention. We conclude with a list of recommendations on how spacing might be incorporated into everyday instruction.
Quantitative theories with free parameters often gain credence when they closely fit data. This is a mistake. A good fit reveals nothing about the flexibility of the theory (how much it cannot fit), ...the variability of the data (how firmly the data rule out what the theory cannot fit), or the likelihood of other outcomes (perhaps the theory could have fit any plausible result), and a reader needs all 3 pieces of information to decide how much the fit should increase belief in the theory. The use of good fits as evidence is not supported by philosophers of science nor by the history of psychology; there seem to be no examples of a theory supported mainly by good fits that has led to demonstrable progress. A better way to test a theory with free parameters is to determine how the theory constrains possible outcomes (i.e., what it predicts), assess how firmly actual outcomes agree with those constraints, and determine if plausible alternative outcomes would have been inconsistent with the theory, allowing for the variability of the data.