A prominent approach to studying the replication crisis has been to conduct replications of several different scientific findings as part of the same research effort. The reported proportion of ...findings that these programs determined failed to replicate have become important statistics in the replication crisis. However, these "failure rates" are based on decisions about whether individual studies replicated, which are themselves subject to statistical uncertainty. In this article, we examine how that uncertainty impacts the accuracy of reported failure rates and find that the reported failure rates can be substantially biased and highly variable. Indeed, very high or very low failure rates could arise from chance alone.
Abstract
Formal empirical assessments of replication have recently become more prominent in several areas of science, including psychology. These assessments have used different statistical ...approaches to determine if a finding has been replicated. The purpose of this article is to provide several alternative conceptual frameworks that lead to different statistical analyses to test hypotheses about replication. All of these analyses are based on statistical methods used in meta-analysis. The differences among the methods described involve whether the burden of proof is placed on replication or nonreplication, whether replication is exact or allows for a small amount of "negligible heterogeneity," and whether the studies observed are assumed to be fixed (constituting the entire body of relevant evidence) or are a sample from a universe of possibly relevant studies. The statistical power of each of these tests is computed and shown to be low in many cases, raising issues of the interpretability of tests for replication.
Translational Abstract
The idea that a finding can be replicated is fundamental to scientific progress. However, several recent studies have called into question the replicability of findings in different fields, including psychology. These studies have garnered attention both in academia and in the popular press, and have become important evidence of a crisis in science. On its face, replication seems like a straightforward idea: just repeat an experiment and check that you get the same results. However, authors of replication studies have noted that it is not that simple. Indeed, analyses of these studies have revealed that we might mean several different (and conflicting) things when we refer to "results" being "the same." This article attempts to clarify some of this ambiguity. It describes a way to precisely define when study results are the same. It also provides analyses that test whether data from replicate studies are consistent with that definition. In general, we find that defining replication and properly framing the analysis requires serious effort, and that unless several studies are conducted, the results of analyses about replication may be inconclusive.
In this study, we reanalyze recent empirical research on replication from a meta-analytic perspective. We argue that there are different ways to define "replication failure," and that analyses can ...focus on exploring variation among replication studies or assess whether their results contradict the findings of the original study. We apply this framework to a set of psychological findings that have been replicated and assess the sensitivity of these analyses. We find that tests for replication that involve only a single replication study are almost always severely underpowered. Among the 40 findings for which ensembles of multisite direct replications were conducted, we find that between 11 and 17 (28% to 43%) ensembles produced heterogeneous effects, depending on how replication is defined. This heterogeneity could not be completely explained by moderators documented by replication research programs. We also find that these ensembles were not always well-powered to detect potentially meaningful values of heterogeneity. Finally, we identify several discrepancies between the results of original studies and the distribution of effects found by multisite replications but note that these analyses also have low power. We conclude by arguing that efforts to assess replication would benefit from further methodological work on designing replication studies to ensure analyses are sufficiently sensitive.
Public Significance Statement
Replication is critical to building reliable scientific knowledge. This article argues that a meta-analytic approach can shed greater light on whether a finding is replicable and applies this approach to empirical research on replication in psychology. It also reports the sensitivity of those analyses.
Esophagogastric junction (EGJ) outflow obstruction (EGJOO) per Chicago Classification v4.0 (CCv4.0) represents a high-resolution manometry (HRM) diagnosis with uncertain clinical significance. This ...study aimed to evaluate functional lumen imaging probe (FLIP) panometry among patients with EGJOO on HRM/CCv4.0 to assess clinical/manometric associations and treatment outcomes.
An observational cohort study was performed on patients who completed FLIP during endoscopy and had an HRM/CCv4.0 diagnosis of EGJOO, i.e., HRM-EGJOO (inconclusive). Abnormal FLIP panometry motility classifications were applied to identify FLIP-confirmed conclusive EGJOO. Rapid drink challenge on HRM and timed barium esophagram were also assessed. Clinical management plan was determined by treating physicians and assessed through chart review. Clinical outcome was defined using the Eckardt score (ES) during follow-up evaluation: ES < 3 was considered a good outcome.
Of 139 adult patients with manometric EGJOO (inconclusive per CCv4.0), a treatment outcome ES was obtained in 55 after achalasia-type treatment (i.e., pneumatic dilation, peroral endoscopic myotomy, laparoscopic Heller myotomy, or botulinum toxin injection) and 36 patients after other nonachalasia-type treatment. Among patients with conclusive EGJOO by HRM-FLIP complementary impression, 77% (33/43) had a good outcome after achalasia-type treatment, whereas 0% (0/12) of patients had a good outcome after nonachalasia-type treatment. Of patients with normal EGJ opening on FLIP, one-third of patients treated with achalasia-type treatment had a good outcome, while 9 of the 10 treated conservatively had a good outcome.
FLIP panometry provides a useful complement to clarify the clinical significance of an HRM/CCv4.0 EGJOO diagnosis and help direct management decisions.
To assess the odds of pregnancy after intrauterine insemination (IUI) timed by ultrasound monitoring and human chorionic gonadotropin (hCG) administration compared with monitoring luteinizing hormone ...(LH) levels.
We searched PubMed (MEDLINE), EMBASE (Elsevier), Scopus (Elsevier), Web of Science (Clarivate Analytics), ClinicalTrials.gov (National Institutes of Health), and the Cochrane Library (Wiley) from the inception until October 1, 2022. No language limitations were applied.
After deduplication, 3,607 unique citations were subjected to blinded independent review by three investigators. Thirteen studies (five retrospective cohort, four cross-sectional, two randomized controlled trials, and two randomized crossover studies) that enrolled women undergoing natural cycle, oral medication (clomid or letrozole), or both for IUI were included in the final random-effects model meta-analysis. Methodologic quality of included studies was assessed with the Downs and Black checklist.
Data extraction was compiled by two authors, including publication information, hCG and LH monitoring guidelines, and pregnancy outcomes. No significant difference in odds of pregnancy between hCG administration and endogenous LH monitoring was observed (odds ratio OR 0.92, 95% CI 0.69-1.22, P =.53). Subgroup analysis of the five studies that included natural cycle IUI outcomes also showed no significant difference in odds of pregnancy between the two methods (OR 0.88, 95% CI 0.46-1.69, P =.61). Finally, a subgroup analysis of 10 studies that included women who underwent ovarian stimulation with oral medications (clomid or letrozole) did not demonstrate a difference in odds of pregnancy between ultrasonography with hCG trigger and LH-timed IUI (OR 0.88, 95% CI 0.66-1.16, P =.32). Statistically significant heterogeneity was noted between studies.
This meta-analysis showed no difference between pregnancy outcomes between at-home LH monitoring and timed IUI.
PROSPERO, CRD42021230520.
In this tutorial, we examine methods for exploring missingness in a dataset in ways that can help to identify the sources and extent of missingness, as well as clarify gaps in evidence.
Using raw ...data from a meta-analysis of substance abuse interventions, we demonstrate the use of exploratory missingness analysis (EMA) including techniques for numerical summaries and visual displays of missing data.
These techniques examine the patterns of missing covariates in meta-analysis data and the relationships among variables with missing data and observed variables including the effect size. The case study shows complex relationships among missingness and other potential covariates in meta-regression, highlighting gaps in the evidence base.
Meta-analysts could often benefit by employing some form of EMA as they encounter missing data.
The problem of assessing whether experimental results can be replicated is becoming increasingly important in many areas of science. It is often assumed that assessing replication is straightforward: ...All one needs to do is repeat the study and see whether the results of the original and replication studies agree. This article shows that the statistical test for whether two studies obtain the same effect is smaller than the power of either study to detect an effect in the first place. Thus, unless the original study and the replication study have unusually high power (e.g., power of 98%), a single replication study will not have adequate sensitivity to provide an unambiguous evaluation of replication.
An association of eosinophilic esophagitis (EoE) with esophageal dysmotility has been described, however, the related mechanism remains unclear. We aimed to evaluate clinical and physiologic ...characteristics, including esophageal distensibility, associated with secondary peristalsis in patients with EoE.
A total of 199 consecutive adult patients with EoE (age, 18-78 y; 32% female) who completed a 16-cm functional luminal imaging probe (FLIP) during endoscopy were evaluated in a cross-sectional study. FLIP panometry contractile response (CR) patterns were classified as normal CR or borderline CR if antegrade contractions were present, and abnormal CRs included impaired/disordered CR, absent CR, or spastic-reactive CR. The distensibility plateau of the esophageal body and esophagogastric junction distensibility was measured with FLIP.
FLIP CR patterns included 68 (34%) normal CR, 65 (33%) borderline CR, 44 (22%) impaired/disordered CR, 16 (8%) absent CR, and 6 (3%) spastic-reactive CR. Compared with normal CRs, abnormal CRs more frequently had reduced esophageal distensibility (distensibility plateau <17 mm in 56% vs 32%), greater total EoE reference scores (median, 5; interquartile range IQR, 3-6 vs median, 4; IQR, 3-5) with more severe ring scores, and a greater duration of symptoms (median, 10 y; IQR, 4-23 y vs median, 7 y; IQR, 3-15 y). Mucosal eosinophil density, however, was similar between abnormal CRs and normal CRs (median, 34 eosinophils/high-power field hpf; IQR, 14-60 eosinophils/hpf vs median, 25 eosinophils/hpf; IQR, 5-50 eosinophils/hpf).
Although normal secondary peristalsis was observed frequently in this EoE cohort, abnormal esophageal CRs were related to EoE disease severity, especially features of fibrostenosis. This study evaluating secondary peristalsis in EoE suggests that esophageal wall remodeling, rather than eosinophilic inflammatory intensity, was associated with esophageal dysmotility in EoE.
Chronic rhinosinusitis with nasal polyps is frequently managed with endoscopic sinus surgery (ESS). Prior studies describe individual clinical variables and eosinophil density measures as prognostic ...for polyp recurrence (PR). However, the relative prognostic significance of these have not been extensively investigated.
We sought to evaluate the impact of PR on measures of disease severity post-ESS and quantify the prognostic value of various clinical variables and biomarkers.
Ninety-four patients with chronic rhinosinusitis with nasal polyps and prospectively biobanked polyp homogenates at the time of ESS were recruited 2 to 5 years post-ESS. Patients were evaluated with patient-reported outcome measures and endoscopic and radiographic scoring pre- and post-ESS. Biomarkers in polyp homogenates were measured with ELISA and Luminex. Relaxed least absolute shrinkage and selection operator regression optimized predictive clinical, biomarker, and combined models. Model performance was assessed using receiver-operating characteristic curve and random forest analysis.
PR was found in 39.4% of patients, despite significant improvements in modified Lund-Mackay (MLM) radiographic and 22-item Sinonasal Outcomes Test scores (both P < .0001). PR was significantly associated with worse post-ESS MLM, modified Lund-Kennedy, and 22-item Sinonasal Outcomes Test scores. Relaxed least absolute shrinkage and selection operator identified 2 clinical predictors (area under the curve = 0.79) and 3 biomarkers (area under the curve = 0.78) that were prognostic for PR. When combined, the model incorporating these pre-ESS factors: MLM, asthma, eosinophil cationic protein, anti–double-stranded DNA IgG, and IL-5 improved PR predictive accuracy to area under the curve of 0.89. Random forest analysis identified and validated each of the 5 variables as the strongest predictors of PR.
PR had strong associations with patient-reported outcome measures, endoscopic and radiographic severity. A combined model comprised of eosinophil cationic protein, IL-5, pre-ESS MLM, asthma, and anti–double-stranded DNA IgG could accurately predict PR.
The design of replication studies Hedges, Larry V.; Schauer, Jacob M.
Journal of the Royal Statistical Society. Series A, Statistics in society,
July 2021, 2021-07-01, 20210701, Letnik:
184, Številka:
3
Journal Article
Recenzirano
Empirical evaluations of replication have become increasingly common, but there has been no unified approach to doing so. Some evaluations conduct only a single replication study while others run ...several, usually across multiple laboratories. Designing such programs has largely contended with difficult issues about which experimental components are necessary for a set of studies to be considered replications. However, another important consideration is that replication studies be designed to support sufficiently sensitive analyses. For instance, if hypothesis tests are to be conducted about replication, studies should be designed to ensure these tests are well‐powered; if not, it can be difficult to determine conclusively if replication attempts succeeded or failed. This paper describes methods for designing ensembles of replication studies to ensure that they are both adequately sensitive and cost‐efficient. It describes two potential analyses of replication studies—hypothesis tests and variance component estimation—and approaches to obtaining optimal designs for them. Using these results, it assesses the statistical power, precision of point estimators and optimality of the design used by the Many Labs Project and finds that while it may have been sufficiently powered to detect some larger differences between studies, other designs would have been less costly and/or produced more precise estimates or higher‐powered hypothesis tests.