Quantifying evidence is an inherent aim of empirical science, yet the customary statistical methods in psychology do not communicate the degree to which the collected data serve as evidence for the ...tested hypothesis. In order to estimate the distribution of the strength of evidence that individual significant results offer in psychology, we calculated Bayes factors (BF) for 287,424 findings of 35,515 articles published in 293 psychological journals between 1985 and 2016. Overall, 55% of all analyzed results were found to provide BF > 10 (often labeled as strong evidence) for the alternative hypothesis, while more than half of the remaining results do not pass the level of BF = 3 (labeled as anecdotal evidence). The results estimate that at least 82% of all published psychological articles contain one or more significant results that do not provide BF > 10 for the hypothesis. We conclude that due to the threshold of acceptance having been set too low for psychological findings, a substantial proportion of the published results have weak evidential support.
Hypnosis and hypnotic suggestions are gradually gaining popularity within the consciousness community as established tools for the experimental manipulation of illusions of involuntariness, ...hallucinations and delusions. However, hypnosis is still far from being a widespread instrument; a crucial hindrance to taking it up is the amount of time needed to invest in identifying people high and low in responsiveness to suggestion. In this study, we introduced an online assessment of hypnotic response and estimated the extent to which the scores and psychometric properties of an online screening differ from an offline one. We propose that the online screening of hypnotic response is viable as it reduces the level of responsiveness only by a slight extent. The application of online screening may prompt researchers to run large-scale studies with more heterogeneous samples, which would help researchers to overcome some of the issues underlying the current replication crisis in psychology.
In the traditional statistical framework, nonsignificant results leave researchers in a state of suspended disbelief. In this study, we examined, empirically, the treatment and evidential impact of ...nonsignificant results. Our specific goals were twofold: to explore how psychologists interpret and communicate nonsignificant results and to assess how much these results constitute evidence in favor of the null hypothesis. First, we examined all nonsignificant findings mentioned in the abstracts of the 2015 volumes of Psychonomic Bulletin & Review, Journal of Experimental Psychology: General, and Psychological Science (N = 137). In 72% of these cases, nonsignificant results were misinterpreted, in that the authors inferred that the effect was absent. Second, a Bayes factor reanalysis revealed that fewer than 5% of the nonsignificant findings provided strong evidence (i.e., BF01 > 10) in favor of the null hypothesis over the alternative hypothesis. We recommend that researchers expand their statistical tool kit in order to correctly interpret nonsignificant results and to be able to evaluate the evidence for and against the null hypothesis.
Exploring the mechanisms of cognitive control is central to understanding how we control our behaviour. These mechanisms can be studied in conflict paradigms, which require the inhibition of ...irrelevant responses to perform the task. It has been suggested that in these tasks, the detection of conflict enhances cognitive control resulting in improved conflict resolution of subsequent trials. If this is the case, then this so-called congruency sequence effect can be expected to occur in cross-domain tasks. Previous research on the domain-generality of the effect presented inconsistent results. In this study, we provide a multi-site replication of three previous experiments of Kan
. (Kan IP, Teubner-Rhodes S, Drummey AB, Nutile L, Krupa L, Novick JM 2013
, 637-651) which test congruency sequence effect between very different domains: from a syntactic to a non-syntactic domain (Experiment 1), and from a perceptual to a verbal domain (Experiments 2 and 3). Despite all our efforts, we found only partial support for the claims of the original study. With a single exception, we could not replicate the original findings; the data remained inconclusive or went against the theoretical hypothesis. We discuss the compatibility of the results with alternative theoretical frameworks.
Most decision-making models describing individual differences in heuristics and biases tasks build on the assumption that reasoners produce a first incorrect answer in a quick, automatic way which ...they may or may not override later and that the advantage of high capacity reasoners arises from this late correction mechanism. To investigate this assumption, we developed a mouse-tracking analysis technique to capture individuals' first answers and subsequent thinking dynamics. Across two denominator neglect task experiments, we observed that individuals initially move the mouse cursor towards the correct answer option in a substantial number of cases suggesting that reasoners may not always produce an incorrect answer first. Furthermore, we observed that, compared to low capacity reasoners, high capacity individuals revise their first answer more frequently if it is incorrect and make fewer changes if it is correct. However, we did not find evidence that high capacity individuals produce correct initial answers more frequently. Consistent with the predictions of previous decision-making models, these results suggest that in the denominator neglect task the capacity-normativity relationship arises after the initial response is formulated. The present work demonstrates how the analysis of mouse trajectories can be utilized to investigate individual differences in decision-making and help us better apprehend the dynamics of thinking behind decision biases.
The low reproducibility rate in social sciences has produced hesitation among researchers in accepting published findings at their face value. Despite the advent of initiatives to increase ...transparency in research reporting, the field is still lacking tools to verify the credibility of research reports. In the present paper, we describe methodologies that let researchers craft highly credible research and allow their peers to verify this credibility. We demonstrate the application of these methods in a multi-laboratory replication of Bem's Experiment 1 (Bem 2011
, 407-425. (doi:10.1037/a0021524)) on extrasensory perception (ESP), which was co-designed by a consensus panel including both proponents and opponents of Bem's original hypothesis. In the study we applied direct data deposition in combination with born-open data and real-time research reports to extend transparency to protocol delivery and data collection. We also used piloting, checklists, laboratory logs and video-documented trial sessions to ascertain as-intended protocol delivery, and external research auditors to monitor research integrity. We found 49.89% successful guesses, while Bem reported 53.07% success rate, with the chance level being 50%. Thus, Bem's findings were not replicated in our study. In the paper, we discuss the implementation, feasibility and perceived usefulness of the credibility-enhancing methodologies used throughout the project.