Abstract
Researchers often conclude an effect is absent when a null-hypothesis significance test yields a nonsignificant p value. However, it is neither logically nor statistically correct to ...conclude an effect is absent when a hypothesis test is not significant. We present two methods to evaluate the presence or absence of effects: Equivalence testing (based on frequentist statistics) and Bayes factors (based on Bayesian statistics). In four examples from the gerontology literature, we illustrate different ways to specify alternative models that can be used to reject the presence of a meaningful or predicted effect in hypothesis tests. We provide detailed explanations of how to calculate, report, and interpret Bayes factors and equivalence tests. We also discuss how to design informative studies that can provide support for a null model or for the absence of a meaningful effect. The conceptual differences between Bayes factors and equivalence tests are discussed, and we also note when and why they might lead to similar or different inferences in practice. It is important that researchers are able to falsify predictions or can quantify the support for predicted null effects. Bayes factors and equivalence tests provide useful statistical tools to improve inferences about null effects.
Psychologists must be able to test both for the presence of an effect and for the absence of an effect. In addition to testing against zero, researchers can use the two one-sided tests (TOST) ...procedure to test for equivalence and reject the presence of a smallest effect size of interest (SESOI). The TOST procedure can be used to determine if an observed effect is surprisingly small, given that a true effect at least as extreme as the SESOI exists. We explain a range of approaches to determine the SESOI in psychological science and provide detailed examples of how equivalence tests should be performed and reported. Equivalence tests are an important extension of the statistical tools psychologists currently use and enable researchers to falsify predictions about the presence, and declare the absence, of meaningful effects.
For almost half a century, Paul Meehl educated psychologists about how the mindless use of null-hypothesis significance tests made research on theories in the social sciences basically ...uninterpretable. In response to the replication crisis, reforms in psychology have focused on formalizing procedures for testing hypotheses. These reforms were necessary and influential. However, as an unexpected consequence, psychological scientists have begun to realize that they may not be ready to test hypotheses. Forcing researchers to prematurely test hypotheses before they have established a sound “derivation chain” between test and theory is counterproductive. Instead, various nonconfirmatory research activities should be used to obtain the inputs necessary to make hypothesis tests informative. Before testing hypotheses, researchers should spend more time forming concepts, developing valid measures, establishing the causal relationships between concepts and the functional form of those relationships, and identifying boundary conditions and auxiliary assumptions. Providing these inputs should be recognized and incentivized as a crucial goal in itself. In this article, we discuss how shifting the focus to nonconfirmatory research can tie together many loose ends of psychology’s reform movement and help us to develop strong, testable theories, as Paul Meehl urged.
Replication of published results is crucial for ensuring the robustness and self-correction of research, yet replications are scarce in many fields. Replicating researchers will therefore often have ...to decide which of several relevant candidates to target for replication. Formal strategies for efficient study selection have been proposed, but none have been explored for practical feasibility – a prerequisite for validation. Here we move one step closer to efficient replication study selection by exploring the feasibility of a particular selection strategy that estimates replication value as a function of citation impact and sample size (Isager, van 't Veer, & Lakens, 2021). We tested our strategy on a sample of fMRI studies in social neuroscience. We first report our efforts to generate a representative candidate set of replication targets. We then explore the feasibility and reliability of estimating replication value for the targets in our set, resulting in a dataset of 1358 studies ranked on their value of prioritising them for replication. In addition, we carefully examine possible measures, test auxiliary assumptions, and identify boundary conditions of measuring value and uncertainty. We end our report by discussing how future validation studies might be designed. Our study demonstrates the importance of investigating how to implement study selection strategies in practice. Our sample and study design can be extended to explore the feasibility of other formal study selection strategies that have been proposed.
Social status is often metaphorically construed in terms of spatial relations such as height, size, and numerosity. This has led to the idea that social status might partially be represented by an ...analogue magnitude system, responsible for processing the magnitude of various physical and abstract dimensions. Accordingly, processing of social status should obey Weber's law. We conducted three studies to investigate whether social status comparisons would indicate behavioral outcomes derived from Weber's law: the distance effect and the size effect. Dependent variable was the latency of status comparisons for a variety of both learned and familiar hierarchies. As predicted and in line with previous findings, we observed a clear distance effect. However, the effect of size variation differed from the size effect hypothesized a priori, and an unexpected interaction between the two effects was observed. In conclusion, we provide a robust confirmation of previous observations of the distance effect in social status comparisons, but the shape of the size effect requires new theorizing.
Increased execution of replication studies contributes to the effort to restore credibility of empirical research. However, a second generation of problems arises: the number of potential replication ...targets is at a serious mismatch with available resources. Given limited resources, replication target selection should be well-justified, systematic and transparently communicated. At present the discussion on what to consider when selecting a replication target is limited to theoretical discussion, self-reported justifications and a few formalized suggestions. In this Registered Report, we proposed a study involving the scientific community to create a list of considerations for consultation when selecting a replication target in psychology. We employed a modified Delphi approach. First, we constructed a preliminary list of considerations. Second, we surveyed psychologists who previously selected a replication target with regards to their considerations. Third, we incorporated the results into the preliminary list of considerations and sent the updated list to a group of individuals knowledgeable about concerns regarding replication target selection. Over the course of several rounds, we established consensus regarding what to consider when selecting a replication target. The resulting checklist can be used for transparently communicating the rationale for selecting studies for replication.
As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand their data sets’ contents. Without proper ...documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a data set. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search-engine indexing to reach a broader audience of interested parties. This Tutorial first explains terminology and standards relevant to data dictionaries and codebooks. Accompanying information on OSF presents a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared data set accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we discuss freely available Web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable.
Inter-personal touch is a powerful aspect of social interaction that we expect to be particularly important for emotional communication. We studied the capacity of closely acquainted humans to signal ...the meaning of several word cues (e.g. gratitude, sadness) using touch sensation alone. Participants communicated all cues with above chance performance. We show that emotionally close people can accurately signal the meaning of different words through touch, and that performance is affected by the amount of contextual information available. Even with minimal context and feedback, both attention-getting and love were communicated surprisingly well. Neither the type of close relationship, nor self-reported comfort with touch significantly affected performance.
Robust scientific knowledge is contingent upon replication of original findings. However, replicating researchers are constrained by resources, and will almost always have to choose one replication ...effort to focus on from a set of potential candidates. To select a candidate efficiently in these cases, we need methods for deciding which out of all candidates considered would be the most useful to replicate, given some overall goal researchers wish to achieve. In this article we assume that the overall goal researchers wish to achieve is to maximize the utility gained by conducting the replication study. We then propose a general rule for study selection in replication research based on the replication value of the set of claims considered for replication. The replication value of a claim is defined as the maximum expected utility we could gain by conducting a replication of the claim, and is a function of (a) the value of being certain about the claim, and (b) uncertainty about the claim based on current evidence. We formalize this definition in terms of a causal decision model, utilizing concepts from decision theory and causal graph modeling. We discuss the validity of using replication value as a measure of expected utility gain, and we suggest approaches for deriving quantitative estimates of replication value. Our goal in this article is not to define concrete guidelines for study selection, but to provide the necessary theoretical foundations on which such concrete guidelines could be built.
Translational AbstractReplication-redoing a study using the same procedures-is an important part of checking the robustness of claims in the psychological literature. The practice of replicating original studies has been woefully devalued for many years, but this is now changing. Recent calls for improving the quality of research in psychology has generated a surge of interest in funding, conducting, and publishing replication studies. Because many studies have never been replicated, and researchers have limited time and money to perform replication studies, researchers must decide which studies are the most important to replicate. This way scientists learn the most, given limited resources. In this article, we lay out what it means to think about what is the most important thing to replicate, and we propose a general decision rule for picking a study to replicate. That rule depends on a concept we call replication value. Replication value is a function of the importance of the study, and how uncertain we are about the findings. In this article we explain how researchers can think precisely about the value of replication studies. We then discuss when and how it makes sense to use replication value as a measure of how valuable a replication study would be, and we discuss factors that funders, journals, or scientists could consider when determining how valuable a replication study is.