Despite the increasing popularity of Bayesian inference in empirical research, few practical guidelines provide detailed recommendations for how to apply Bayesian procedures and interpret the ...results. Here we offer specific guidelines for four different stages of Bayesian statistical reasoning in a research setting:
planning
the analysis,
executing
the analysis,
interpreting
the results, and
reporting
the results. The guidelines for each stage are illustrated with a running example. Although the guidelines are geared towards analyses performed with the open-source statistical software JASP, most guidelines extend to Bayesian inference in general.
Over the years, researchers in psychological science have documented and investigated a host of powerful cognitive fallacies, including hindsight bias and confirmation bias. Researchers themselves ...may not be immune to these fallacies and may unwittingly adjust their statistical analysis to produce an outcome that is more pleasant or better in line with prior expectations. To shield researchers from the impact of cognitive fallacies, several methodologists are now advocating preregistration—that is, the creation of a detailed analysis plan before data collection or data analysis. One may argue, however, that preregistration is out of touch with academic reality, hampering creativity and impeding scientific progress. We provide a historical overview to show that the interplay between creativity and verification has shaped theories of scientific inquiry throughout the centuries; in the currently dominant theory, creativity and verification operate in succession and enhance one another’s effectiveness. From this perspective, the use of preregistration to safeguard the verification stage will help rather than hinder the generation of fruitful new ideas.
Facing the Unknown Unknowns of Data Analysis Wagenmakers, Eric-Jan; Sarafoglou, Alexandra; Aczel, Balazs
Current directions in psychological science : a journal of the American Psychological Society,
10/2023, Letnik:
32, Številka:
5
Journal Article
Recenzirano
Odprti dostop
Empirical claims are inevitably associated with uncertainty, and a major goal of data analysis is therefore to quantify that uncertainty. Recent work has revealed that most uncertainty may lie not in ...what is usually reported (e.g., p value, confidence interval, or Bayes factor) but in what is left unreported (e.g., how the experiment was designed, whether the conclusion is robust under plausible alternative analysis protocols, and how credible the authors believe their hypothesis to be). This suggests that the rigorous evaluation of an empirical claim involves an assessment of the entire empirical cycle and that scientific progress benefits from radical transparency in planning, data management, inference, and reporting. We summarize recent methodological developments in this area and conclude that the focus on a single statistical analysis is myopic. Sound statistical analysis is important, but social scientists may gain more insight by taking a broad view on uncertainty and by working to reduce the “unknown unknowns” that still plague reporting practice.
The replicability of findings in experimental psychology can be improved by distinguishing sharply between hypothesis-generating research and hypothesis-testing research. This distinction can be ...achieved by preregistration, a method that has recently attracted widespread attention. Although preregistration is fair in the sense that it inoculates researchers against hindsight bias and confirmation bias, preregistration does not allow researchers to analyze the data flexibly without the analysis being demoted to exploratory. To alleviate this concern we discuss how researchers may conduct blinded analyses (MacCoun and Perlmutter in Nature 526:187–189, 2015). As with preregistration, blinded analyses break the feedback loop between the analysis plan and analysis outcome, thereby preventing cherry-picking and significance seeking. However, blinded analyses retain the flexibility to account for unexpected peculiarities in the data. We discuss different methods of blinding, offer recommendations for blinding of popular experimental designs, and introduce the design for an online blinding protocol.
The
multibridge
R package allows a Bayesian evaluation of informed hypotheses
H
r
applied to frequency data from an independent binomial or multinomial distribution.
multibridge
uses bridge sampling ...to efficiently compute Bayes factors for the following hypotheses concerning the latent category proportions
𝜃
: (a) hypotheses that postulate equality constraints (e.g.,
𝜃
1
=
𝜃
2
=
𝜃
3
); (b) hypotheses that postulate inequality constraints (e.g.,
𝜃
1
<
𝜃
2
<
𝜃
3
or
𝜃
1
>
𝜃
2
>
𝜃
3
); (c) hypotheses that postulate combinations of inequality constraints and equality constraints (e.g.,
𝜃
1
<
𝜃
2
=
𝜃
3
); and (d) hypotheses that postulate combinations of (a)–(c) (e.g.,
𝜃
1
< (
𝜃
2
=
𝜃
3
),
𝜃
4
). Any informed hypothesis
H
r
may be compared against the encompassing hypothesis
H
e
that all category proportions vary freely, or against the null hypothesis
H
0
that all category proportions are equal.
multibridge
facilitates the fast and accurate comparison of large models with many constraints and models for which relatively little posterior mass falls in the restricted parameter space. This paper describes the underlying methodology and illustrates the use of
multibridge
through fully reproducible examples.
The preregistration of research protocols and analysis plans is a main reform innovation to counteract confirmation bias in the social and behavioural sciences. While theoretical reasons to ...preregister are frequently discussed in the literature, the individually experienced advantages and disadvantages of this method remain largely unexplored. The goal of this exploratory study was to identify the perceived benefits and challenges of preregistration from the researcher's perspective. To this end, we surveyed 355 researchers, 299 of whom had used preregistration in their own work. The researchers indicated the experienced or expected effects of preregistration on their workflow. The results show that experiences and expectations are mostly positive. Researchers in our sample believe that implementing preregistration improves or is likely to improve the quality of their projects. Criticism of preregistration is primarily related to the increase in work-related stress and the overall duration of the project. While the benefits outweighed the challenges for the majority of researchers with preregistration experience, this was not the case for the majority of researchers without preregistration experience. The experienced advantages and disadvantages identified in our survey could inform future efforts to improve preregistration and thus help the methodology gain greater acceptance in the scientific community.
Ordinal-scale items—say items that assess agreement with a proposition on an ordinal rating scale from strongly disagree to strongly agree—are exceedingly popular in psychological research. A common ...research question concerns the comparison of response distributions on ordinal-scale items across conditions. In this context, there is often a lingering question of whether metric-level descriptions of the results and parametric tests are appropriate. We consider a different problem, perhaps one that supersedes the parametric-vs-nonparametric issue: When is it appropriate to reduce the comparison of two (ordinal) distributions to the comparison of simple summary statistics (e.g., measures of location)? In this paper, we provide a Bayesian modeling approach to help researchers perform meaningful comparisons of two response distributions and draw appropriate inferences from ordinal-scale items. We develop four statistical models that represent possible relationships between two distributions: an unconstrained model representing a complex, non-ordinal relationship, a nonparametric stochastic-dominance model, a parametric shift model, and a null model representing equivalence in distribution. We show how these models can be compared in light of data with Bayes factors and illustrate their usefulness with two real-world examples. We also provide a freely available web applet for researchers who wish to adopt the approach.
We argue that statistical practice in the social and behavioural sciences benefits from transparency, a fair acknowledgement of uncertainty and openness to alternative interpretations. Here, to ...promote such a practice, we recommend seven concrete statistical procedures: (1) visualizing data; (2) quantifying inferential uncertainty; (3) assessing data preprocessing choices; (4) reporting multiple models; (5) involving multiple analysts; (6) interpreting results modestly; and (7) sharing data and code. We discuss their benefits and limitations, and provide guidelines for adoption. Each of the seven procedures finds inspiration in Merton's ethos of science as reflected in the norms of communalism, universalism, disinterestedness and organized scepticism. We believe that these ethical considerations-as well as their statistical consequences-establish common ground among data analysts, despite continuing disagreements about the foundations of statistical inference.
Although preregistration can reduce researcher bias and increase transparency in primary research settings, it is less applicable to secondary data analysis. An alternative method that affords ...additional protection from researcher bias, which cannot be gained from conventional forms of preregistration alone, is an Explore and Confirm Analysis Workflow (ECAW). In this workflow, a data management organization initially provides access to only a subset of their dataset to researchers who request it. The researchers then prepare an analysis script based on the subset of data, upload the analysis script to a registry, and then receive access to the full dataset. ECAWs aim to achieve similar goals to preregistration, but make access to the full dataset contingent on compliance. The present survey aimed to garner information from the research community where ECAWs could be applied-employing the Avon Longitudinal Study of Parents and Children (ALSPAC) as a case example.
We emailed a Web-based survey to researchers who had previously applied for access to ALSPAC's transgenerational observational dataset.
We received 103 responses, for a 9% response rate. The results suggest that-at least among our sample of respondents-ECAWs hold the potential to serve their intended purpose and appear relatively acceptable. For example, only 10% of respondents disagreed that ALSPAC should run a study on ECAWs (versus 55% who agreed). However, as many as 26% of respondents agreed that they would be less willing to use ALSPAC data if they were required to use an ECAW (versus 45% who disagreed).
Our data and findings provide information for organizations and individuals interested in implementing ECAWs and related interventions.
. https://osf.io/g2fw5 Deviations from the preregistration are outlined in electronic supplementary material A.
In the 2020 Neuroimaging Analysis Replication and Prediction Study2, 70 teams used the same functional magnetic resonance imaging (MRI) data to test 9 hypotheses about brain activity in a ...risky-decision task. Here again, the coordinators concluded that differences in findings were due not to errors, but to the wide range of alternative plausible analysis decisions and statistical models. ...journals could invite teams to contribute their own analyses in the form of comments on a recently accepted article. ...multi-analyst projects usually excel in data sharing, transparent reporting and theory-driven research.