IMPORTANCE: The use and misuse of P values has generated extensive debates. OBJECTIVE: To evaluate in large scale the P values reported in the abstracts and full text of biomedical research articles ...over the past 25 years and determine how frequently statistical information is presented in ways other than P values. DESIGN: Automated text-mining analysis was performed to extract data on P values reported in 12 821 790 MEDLINE abstracts and in 843 884 abstracts and full-text articles in PubMed Central (PMC) from 1990 to 2015. Reporting of P values in 151 English-language core clinical journals and specific article types as classified by PubMed also was evaluated. A random sample of 1000 MEDLINE abstracts was manually assessed for reporting of P values and other types of statistical information; of those abstracts reporting empirical data, 100 articles were also assessed in full text. MAIN OUTCOMES AND MEASURES: P values reported. RESULTS: Text mining identified 4 572 043 P values in 1 608 736 MEDLINE abstracts and 3 438 299 P values in 385 393 PMC full-text articles. Reporting of P values in abstracts increased from 7.3% in 1990 to 15.6% in 2014. In 2014, P values were reported in 33.0% of abstracts from the 151 core clinical journals (n = 29 725 abstracts), 35.7% of meta-analyses (n = 5620), 38.9% of clinical trials (n = 4624), 54.8% of randomized controlled trials (n = 13 544), and 2.4% of reviews (n = 71 529). The distribution of reported P values in abstracts and in full text showed strong clustering at P values of .05 and of .001 or smaller. Over time, the “best” (most statistically significant) reported P values were modestly smaller and the “worst” (least statistically significant) reported P values became modestly less significant. Among the MEDLINE abstracts and PMC full-text articles with P values, 96% reported at least 1 P value of .05 or lower, with the proportion remaining steady over time in PMC full-text articles. In 1000 abstracts that were manually reviewed, 796 were from articles reporting empirical data; P values were reported in 15.7% (125/796 95% CI, 13.2%-18.4%) of abstracts, confidence intervals in 2.3% (18/796 95% CI, 1.3%-3.6%), Bayes factors in 0% (0/796 95% CI, 0%-0.5%), effect sizes in 13.9% (111/796 95% CI, 11.6%-16.5%), other information that could lead to estimation of P values in 12.4% (99/796 95% CI, 10.2%-14.9%), and qualitative statements about significance in 18.1% (181/1000 95% CI, 15.8%-20.6%); only 1.8% (14/796 95% CI, 1.0%-2.9%) of abstracts reported at least 1 effect size and at least 1 confidence interval. Among 99 manually extracted full-text articles with data, 55 reported P values, 4 presented confidence intervals for all reported effect sizes, none used Bayesian methods, 1 used false-discovery rates, 3 used sample size/power calculations, and 5 specified the primary outcome. CONCLUSIONS AND RELEVANCE: In this analysis of P values reported in MEDLINE abstracts and in PMC articles from 1990-2015, more MEDLINE abstracts and articles reported P values over time, almost all abstracts and articles with P values reported statistically significant results, and, in a subgroup analysis, few articles included confidence intervals, Bayes factors, or effect sizes. Rather than reporting isolated P values, articles should include effect sizes and uncertainty metrics.
In this Viewpoint, John Ioannidis and colleagues discuss the proliferation of undervalidated or unvalidated clinical prediction models (CPMs) and propose an open-source repository where risk ...prediction scores could be updated in real time and validated as a means to facilitate identification of best-performing and reduce creation of duplicate or unhelpful models.
Metabolomics is the field of "-omics" research concerned with the comprehensive characterization of the small low-molecular-weight metabolites in biological samples. In epidemiology, it represents an ...emerging technology and an unprecedented opportunity to measure environmental and other exposures with improved precision and far less measurement error than with standard epidemiologic methods. Advances in the application of metabolomics in large-scale epidemiologic research are now being realized through a combination of improved sample preparation and handling, automated laboratory and processing methods, and reduction in costs. The number of epidemiologic studies that use metabolic profiling is still limited, but it is fast gaining popularity in this area. In the present article, we present a roadmap for metabolomic analyses in epidemiologic studies and discuss the various challenges these data pose to large-scale studies. We discuss the steps of data preprocessing, univariate and multivariate data analysis, correction for multiplicity of comparisons with correlated data, and finally the steps of cross-validation and external validation. As data from metabolomic studies accumulate in epidemiology, there is a need for large-scale replication and synthesis of findings, increased availability of raw data, and a focus on good study design, all of which will highlight the potential clinical impact of metabolomics in this field.
Meta-assessment of bias in science Fanelli, Daniele; Costas, Rodrigo; Ioannidis, John P. A.
Proceedings of the National Academy of Sciences - PNAS,
04/2017, Letnik:
114, Številka:
14
Journal Article
Recenzirano
Odprti dostop
Numerous biases are believed to affect the scientific literature, but their actual prevalence across disciplines is unknown. To gain a comprehensive picture of the potential imprint of bias in ...science, we probed for the most commonly postulated bias-related patterns and risk factors, in a large random sample of meta-analyses taken from all disciplines. The magnitude of these biases varied widely across fields and was overall relatively small. However, we consistently observed a significant risk of small, early, and highly cited studies to overestimate effects and of studies not published in peer-reviewed journals to underestimate them. We also found at least partial confirmation of previous evidence suggesting that US studies and early studies might report more extreme effects, although these effects were smaller and more heterogeneously distributed across meta-analyses and disciplines. Authors publishing at high rates and receiving many citations were, overall, not at greater risk of bias. However, effect sizes were likely to be overestimated by early-career researchers, those working in small or long-distance collaborations, and those responsible for scientific misconduct, supporting hypotheses that connect bias to situational factors, lack of mutual control, and individual integrity. Some of these patterns and risk factors might have modestly increased in intensity over time, particularly in the social sciences. Our findings suggest that, besides one being routinely cautious that published small, highly-cited, and earlier studies may yield inflated results, the feasibility and costs of interventions to attenuate biases in the literature might need to be discussed on a discipline-specific and topic-specific basis.
The aim of this study was to investigate whether the combination of conventional pulmonary vein isolation (PVI) by circumferential antral ablation with ganglionated plexi (GP) modification in a ...single ablation procedure, yields higher success rates than PVI or GP ablation alone, in patients with paroxysmal atrial fibrillation (PAF).
Conventional PVI transects the major left atrial GP, and it is possible that autonomic denervation by inadvertent GP ablation plays a central role in the efficacy of PVI.
A total of 242 patients with symptomatic PAF were recruited and randomized as follows: 1) circumferential PVI (n = 78); 2) anatomic ablation of the main left atrial GP (n = 82); or 3) circumferential PVI followed by anatomic ablation of the main left atrial GP (n = 82). The primary endpoint was freedom from atrial fibrillation (AF) or other sustained atrial tachycardia (AT), verified by monthly visits, ambulatory electrocardiographic monitoring, and implantable loop recorders, during a 2-year follow-up period.
Freedom from AF or AT was achieved in 44 (56%), 39 (48%), and 61 (74%) patients in the PVI, GP, and PVI+GP groups, respectively (p = 0.004 by log-rank test). PVI+GP ablation strategy compared with PVI alone yielded a hazard ratio of 0.53 (95% confidence interval: 0.31 to 0.91; p = 0.022) for recurrence of AF or AT. Fluoroscopy duration was 16 ± 3 min, 20 ± 5 min, and 23 ± 5 min for PVI, GP, and PVI+GP groups, respectively (p < 0.001). Post-ablation atrial flutter did not differ between groups: 5.1% in PVI, 4.9% in GP, and 6.1% in PVI+GP. No serious adverse procedure-related events were encountered.
Addition of GP ablation to PVI confers a significantly higher success rate compared with either PVI or GP alone in patients with PAF.
Abstract Background Oversights in the physical examination are a type of medical error not easily studied by chart review. They may be a major contributor to missed or delayed diagnosis, unnecessary ...exposure to contrast and radiation, incorrect treatment, and other adverse consequences. Our purpose was to collect vignettes of physical examination oversights and to capture the diversity of their characteristics and consequences. Methods A cross-sectional study using an 11-question qualitative survey for physicians was distributed electronically, with data collected from February to June of 2011. The participants were all physicians responding to e-mail or social media invitations to complete the survey. There were no limitations on geography, specialty, or practice setting. Results Of the 208 reported vignettes that met inclusion criteria, the oversight was caused by a failure to perform the physical examination in 63%; 14% reported that the correct physical examination sign was elicited but misinterpreted, whereas 11% reported that the relevant sign was missed or not sought. Consequence of the physical examination inadequacy included missed or delayed diagnosis in 76% of cases, incorrect diagnosis in 27%, unnecessary treatment in 18%, no or delayed treatment in 42%, unnecessary diagnostic cost in 25%, unnecessary exposure to radiation or contrast in 17%, and complications caused by treatments in 4%. The mode of the number of physicians missing the finding was 2, but many oversights were missed by many physicians. Most oversights took up to 5 days to identify, but 66 took longer. Special attention and skill in examining the skin and its appendages, as well as the abdomen, groin, and genitourinary area could reduce the reported oversights by half. Conclusions Physical examination inadequacies are a preventable source of medical error, and adverse events are caused mostly by failure to perform the relevant examination.
IMPORTANCE: Convalescent plasma is a proposed treatment for COVID-19. OBJECTIVE: To assess clinical outcomes with convalescent plasma treatment vs placebo or standard of care in peer-reviewed and ...preprint publications or press releases of randomized clinical trials (RCTs). DATA SOURCES: PubMed, the Cochrane COVID-19 trial registry, and the Living Overview of Evidence platform were searched until January 29, 2021. STUDY SELECTION: The RCTs selected compared any type of convalescent plasma vs placebo or standard of care for patients with confirmed or suspected COVID-19 in any treatment setting. DATA EXTRACTION AND SYNTHESIS: Two reviewers independently extracted data on relevant clinical outcomes, trial characteristics, and patient characteristics and used the Cochrane Risk of Bias Assessment Tool. The primary analysis included peer-reviewed publications of RCTs only, whereas the secondary analysis included all publicly available RCT data (peer-reviewed publications, preprints, and press releases). Inverse variance–weighted meta-analyses were conducted to summarize the treatment effects. The certainty of the evidence was assessed using the Grading of Recommendations Assessment, Development, and Evaluation. MAIN OUTCOMES AND MEASURES: All-cause mortality, length of hospital stay, clinical improvement, clinical deterioration, mechanical ventilation use, and serious adverse events. RESULTS: A total of 1060 patients from 4 peer-reviewed RCTs and 10 722 patients from 6 other publicly available RCTs were included. The summary risk ratio (RR) for all-cause mortality with convalescent plasma in the 4 peer-reviewed RCTs was 0.93 (95% CI, 0.63 to 1.38), the absolute risk difference was −1.21% (95% CI, −5.29% to 2.88%), and there was low certainty of the evidence due to imprecision. Across all 10 RCTs, the summary RR was 1.02 (95% CI, 0.92 to 1.12) and there was moderate certainty of the evidence due to inclusion of unpublished data. Among the peer-reviewed RCTs, the summary hazard ratio was 1.17 (95% CI, 0.07 to 20.34) for length of hospital stay, the summary RR was 0.76 (95% CI, 0.20 to 2.87) for mechanical ventilation use (the absolute risk difference for mechanical ventilation use was −2.56% 95% CI, −13.16% to 8.05%), and there was low certainty of the evidence due to imprecision for both outcomes. Limited data on clinical improvement, clinical deterioration, and serious adverse events showed no significant differences. CONCLUSIONS AND RELEVANCE: Treatment with convalescent plasma compared with placebo or standard of care was not significantly associated with a decrease in all-cause mortality or with any benefit for other clinical outcomes. The certainty of the evidence was low to moderate for all-cause mortality and low for other outcomes.
Abstract Objectives Between-study heterogeneity plays an important role in random-effects models for meta-analysis. Most clinical trials are small, and small trials are often associated with larger ...effect sizes. We empirically evaluated whether there is also a relationship between trial size and heterogeneity ( τ ). Study Design and Setting We selected the first meta-analysis per intervention review of the Cochrane Database of Systematic Reviews Issues 2009–2013 with a dichotomous ( n = 2,009) or continuous ( n = 1,254) outcome. The association between estimated τ and trial size was evaluated across meta-analyses using regression and within meta-analyses using a Bayesian approach. Small trials were predefined as those having standard errors (SEs) over 0.2 standardized effects. Results Most meta-analyses were based on few (median 4) trials. Within the same meta-analysis, the small study τS2 was larger than the large-study τL2 average ratio 2.11; 95% credible interval (1.05, 3.87) for dichotomous and 3.11 (2.00, 4.78) for continuous meta-analyses. The imprecision of τS was larger than of τL : median SE 0.39 vs. 0.20 for dichotomous and 0.22 vs. 0.13 for continuous small-study and large-study meta-analyses. Conclusion Heterogeneity between small studies is larger than between larger studies. The large imprecision with which τ is estimated in a typical small-studies' meta-analysis is another reason for concern, and sensitivity analyses are recommended.
Summary Correctable weaknesses in the design, conduct, and analysis of biomedical and public health research studies can produce misleading results and waste valuable resources. Small effects can be ...difficult to distinguish from bias introduced by study design and analyses. An absence of detailed written protocols and poor documentation of research is common. Information obtained might not be useful or important, and statistical precision or power is often too low or used in a misleading way. Insufficient consideration might be given to both previous and continuing studies. Arbitrary choice of analyses and an overemphasis on random extremes might affect the reported findings. Several problems relate to the research workforce, including failure to involve experienced statisticians and methodologists, failure to train clinical researchers and laboratory scientists in research methods and design, and the involvement of stakeholders with conflicts of interest. Inadequate emphasis is placed on recording of research decisions and on reproducibility of research. Finally, reward systems incentivise quantity more than quality, and novelty more than reliability. We propose potential solutions for these problems, including improvements in protocols and documentation, consideration of evidence from studies in progress, standardisation of research efforts, optimisation and training of an experienced and non-conflicted scientific workforce, and reconsideration of scientific reward systems.