Most neuroscientists would agree that for brain research to progress, we have to know which experimental manipulations have no effect as much as we must identify those that do have an effect. The ...dominant statistical approaches used in neuroscience rely on P values and can establish the latter but not the former. This makes non-significant findings difficult to interpret: do they support the null hypothesis or are they simply not informative? Here we show how Bayesian hypothesis testing can be used in neuroscience studies to establish both whether there is evidence of absence and whether there is absence of evidence. Through simple tutorial-style examples of Bayesian t-tests and ANOVA using the open-source project JASP, this article aims to empower neuroscientists to use this approach to provide compelling and rigorous evidence for the absence of an effect.
ABSTRACT
Recently there has been a growing concern that many published research findings do not hold up in attempts to replicate them. We argue that this problem may originate from a culture of ‘you ...can publish if you found a significant effect’. This culture creates a systematic bias against the null hypothesis which renders meta‐analyses questionable and may even lead to a situation where hypotheses become difficult to falsify. In order to pinpoint the sources of error and possible solutions, we review current scientific practices with regard to their effect on the probability of drawing a false‐positive conclusion. We explain why the proportion of published false‐positive findings is expected to increase with (i) decreasing sample size, (ii) increasing pursuit of novelty, (iii) various forms of multiple testing and researcher flexibility, and (iv) incorrect P‐values, especially due to unaccounted pseudoreplication, i.e. the non‐independence of data points (clustered data). We provide examples showing how statistical pitfalls and psychological traps lead to conclusions that are biased and unreliable, and we show how these mistakes can be avoided. Ultimately, we hope to contribute to a culture of ‘you can publish if your study is rigorous’. To this end, we highlight promising strategies towards making science more objective. Specifically, we enthusiastically encourage scientists to preregister their studies (including a priori hypotheses and complete analysis plans), to blind observers to treatment groups during data collection and analysis, and unconditionally to report all results. Also, we advocate reallocating some efforts away from seeking novelty and discovery and towards replicating important research findings of one's own and of others for the benefit of the scientific community as a whole. We believe these efforts will be aided by a shift in evaluation criteria away from the current system which values metrics of ‘impact’ almost exclusively and towards a system which explicitly values indices of scientific rigour.
Statistical procedures such as Bayes factor model selection and Bayesian model averaging require the computation of normalizing constants (e.g., marginal likelihoods). These normalizing constants are ...notoriously difficult to obtain, as they usually involve highdimensional integrals that cannot be solved analytically. Here we introduce an R package that uses bridge sampling (Meng and Wong 1996; Meng and Schilling 2002) to estimate normalizing constants in a generic and easy-to-use fashion. For models implemented in Stan, the estimation procedure is automatic. We illustrate the functionality of the package with three examples.
We propose a default Bayesian hypothesis test for the presence of a correlation or a partial correlation. The test is a direct application of Bayesian techniques for variable selection in regression ...models. The test is easy to apply and yields practical advantages that the standard frequentist tests lack; in particular, the Bayesian test can quantify evidence in favor of the null hypothesis and allows researchers to monitor the test results as the data come in. We illustrate the use of the Bayesian correlation test with three examples from the psychological literature. Computer code and example data are provided in the journal archives.
A growing number of researchers use descriptive distributions such as the ex-Gaussian and the shifted Wald to summarize response time data for speeded two-choice tasks. Some of these researchers also ...assume that the parameters of these distributions uniquely correspond to specific cognitive processes. We studied the validity of this cognitive interpretation by relating the parameters of the ex-Gaussian and shifted Wald distributions to those of the Ratcliff diffusion model, a successful model whose parameters have well-established cognitive interpretations. In a simulation study, we fitted the ex-Gaussian and shifted Wald distributions to data generated from the diffusion model by systematically varying its parameters across a wide range of plausible values. In an empirical study, the two descriptive distributions were fitted to published data that featured manipulations of task difficulty, response caution, and a priori bias. The results clearly demonstrate that the ex-Gaussian and shifted Wald parameters do not correspond uniquely to parameters of the diffusion model. We conclude that researchers should resist the temptation to interpret changes in the ex-Gaussian and shifted Wald parameters in terms of cognitive processes. Supporting materials may be downloaded from http://pbr.psychonomic-journals .org/content/supplemental.
Pearson's correlation is one of the most common measures of linear dependence. Recently, Bernardo (11th International Workshop on Objective Bayes Methodology, 2015) introduced a flexible class of ...priors to study this measure in a Bayesian setting. For this large class of priors, we show that the (marginal) posterior for Pearson's correlation coefficient and all of the posterior moments are analytic. Our results are available in the open‐source software package JASP.
We describe a general method that allows experimenters to quantify the evidence from the data of a direct replication attempt given data already acquired from an original study. These so-called ...replication Bayes factors are a reconceptualization of the ones introduced by Verhagen and Wagenmakers (
Journal of Experimental Psychology: General, 143
(4), 1457–1475
2014
) for the common
t
test. This reconceptualization is computationally simpler and generalizes easily to most common experimental designs for which Bayes factors are available.
Harold Jeffreys pioneered the development of default Bayes factor hypothesis tests for standard statistical problems. Using Jeffreys’s Bayes factor hypothesis tests, researchers can grade the ...decisiveness of the evidence that the data provide for a point null hypothesis H0 versus a composite alternative hypothesis H1. Consequently, Jeffreys’s tests are of considerable theoretical and practical relevance for empirical researchers in general and for experimental psychologists in particular. To highlight this relevance and to facilitate the interpretation and use of Jeffreys’s Bayes factor tests we focus on two common inferential scenarios: testing the nullity of a normal mean (i.e., the Bayesian equivalent of the t-test) and testing the nullity of a correlation. For both Bayes factor tests, we explain their development, we extend them to one-sided problems, and we apply them to concrete examples from experimental psychology.
•The Bayes factor follows logically from Jeffreys’s philosophy of model selection.•The ideas are illustrated with two examples: the Bayesian t-test and correlation test.•The Bayes factors are adapted to one-sided tests.•The Bayes factors are illustrated with various applications in psychological research.
Informed Bayesian t-Tests Gronau, Quentin F.; Ly, Alexander; Wagenmakers, Eric-Jan
The American statistician,
04/2020, Letnik:
74, Številka:
2
Journal Article
Recenzirano
Odprti dostop
Across the empirical sciences, few statistical procedures rival the popularity of the frequentist
-test. In contrast, the Bayesian versions of the
-test have languished in obscurity. In recent years, ...however, the theoretical and practical advantages of the Bayesian
-test have become increasingly apparent and various Bayesian t-tests have been proposed, both objective ones (based on general desiderata) and subjective ones (based on expert knowledge). Here, we propose a flexible t-prior for standardized effect size that allows computation of the Bayes factor by evaluating a single numerical integral. This specification contains previous objective and subjective t-test Bayes factors as special cases. Furthermore, we propose two measures for informed prior distributions that quantify the departure from the objective Bayes factor desiderata of predictive matching and information consistency. We illustrate the use of informed prior distributions based on an expert prior elicitation effort.
Supplementary materials
for this article are available online.