Electronic health records (EHRs) are widely used in epidemiological research, but the validity of the results is dependent upon the assumptions made about the healthcare system, the patient, and the ...provider. In this review, we identify four overarching challenges in using EHR-based data for epidemiological analysis, with a particular emphasis on threats to validity. These challenges include representativeness of the EHR to a target population, the availability and interpretability of clinical and non-clinical data, and missing data at both the variable and observation levels. Each challenge reveals layers of assumptions that the epidemiologist is required to make, from the point of patient entry into the healthcare system, to the provider documenting the results of the clinical exam and follow-up of the patient longitudinally; all with the potential to bias the results of analysis of these data. Understanding the extent of as well as remediating potential biases requires a variety of methodological approaches, from traditional sensitivity analyses and validation studies, to newer techniques such as natural language processing. Beyond methods to address these challenges, it will remain crucial for epidemiologists to engage with clinicians and informaticians at their institutions to ensure data quality and accessibility by forming multidisciplinary teams around specific research projects.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Electronic health records (EHRs) have become ubiquitous in clinical practice. Given the rich biomedical data captured for a large panel of patients, secondary analysis of these data for health ...research is also commonplace. Yet, there are many caveats to EHR data that the researchers must be aware of, such as the accuracy of and motive for documentation, and the reason for patients’ visits to the clinic. The clinician—the author of the documentation—is thus central to the correct interpretation of EHR data for research purposes. In this study, I interviewed 11 physicians in various clinical specialties to bring attention to their view on the validity of research using EHR data. Qualitative, in-depth, one-on-one interviews were conducted with practicing physicians in inpatient and outpatient medicine. Content analysis using a data-driven, inductive approach to identify themes related to challenges and opportunities in the reuse of EHR data for secondary analysis generated seven themes. Themes that reflected challenges of EHRs for research included (1) audience, (2) accuracy of data, (3) availability of data, (4) documentation practices, and (5) representativeness. Themes that reflected opportunities of EHRs for research included (6) endorsement and (7) enablers. The greatest perceived barriers reflected the intended audience of the EHR, the interpretation and meaning of the data, and the quality of the data for research purposes. Physicians generally expressed more perceived challenges than opportunities in the reuse of EHR data for research purposes; however, they remained optimistic.
Full text
Available for:
NUK, OILJ, SAZU, UKNU, UL, UM, UPUK, VSZLJ
4.
Misinformation Goldstein, Neal D
American journal of public health (1971),
02/2021, Volume:
111, Issue:
2
Journal Article
Peer reviewed
Open access
In their article, the authors acknowledged that misinformation (disseminated via social media) is damaging and sows distrust in public health: this has been well established.2 Misinformation and its ...more nefarious relative, disinformation, are indeed a problem for public health scientists whose interest is promoting health. AstraZeneca's release of their coronavirus disease 2019 vaccine clinical trial protocol is a proactive example (an "inoculant" in the framework's terminology) of transparency to strengthen public confidence.5 An open and transparent science is crucial in the era of the "reproducibility crisis. From "infodemics" to health promotion: a novel framework for the role of social media in public health.
Full text
Available for:
CEKLJ, DOBA, FSPLJ, IZUM, KILJ, NUK, ODKLJ, OILJ, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK, VSZLJ
Background:
Surveillance data captured during the COVID-19 pandemic may not be optimal to inform a public health response, because it is biased by imperfect test accuracy, differential access to ...testing, and uncertainty in date of infection.
Methods:
We downloaded COVID-19 time-series surveillance data from the Colorado Department of Public Health & Environment by report and illness onset dates for 9 March 2020 to 30 September 2020. We used existing Bayesian methods to first adjust for misclassification in testing and surveillance, followed by deconvolution of date of infection. We propagated forward uncertainty from each step corresponding to 10,000 posterior time-series of doubly adjusted epidemic curves. The effective reproduction number (
R
t
), a parameter of principal interest in tracking the pandemic, gauged the impact of the adjustment on inference.
Results:
Observed period prevalence was 1.3%; median of the posterior of true (adjusted) prevalence was 1.7% (95% credible interval CrI: 1.4%, 1.8%). Sensitivity of surveillance declined over the course of the epidemic from a median of 88.8% (95% CrI: 86.3%, 89.8%) to a median of 60.8% (95% CrI: 60.1%, 62.6%). The mean (minimum, maximum) values of
R
t
were higher and more variable by report date, 1.12 (0.77, 4.13), compared to those following adjustment, 1.05 (0.89, 1.73). The epidemic curve by report date tended to overestimate
R
t
early on and be more susceptible to fluctuations in data.
Conclusion:
Adjusting for epidemic curves based on surveillance data is necessary if estimates of missed cases and the effective reproduction number play a role in management of the COVID-19 pandemic.
Despite widespread use, the accuracy of the diagnostic test for SARS-CoV-2 infection is poorly understood. The aim of our work was to better quantify misclassification errors in identification of ...true cases of COVID-19 and to study the impact of these errors in epidemic curves using publicly available surveillance data from Alberta, Canada and Philadelphia, USA.
We examined time-series data of laboratory tests for SARS-CoV-2 viral infection, the causal agent for COVID-19, to try to explore, using a Bayesian approach, the sensitivity and specificity of the diagnostic test.
Our analysis revealed that the data were compatible with near-perfect specificity, but it was challenging to gain information about sensitivity. We applied these insights to uncertainty/bias analysis of epidemic curves under the assumptions of both improving and degrading sensitivity. If the sensitivity improved from 60 to 95%, the adjusted epidemic curves likely falls within the 95% confidence intervals of the observed counts. However, bias in the shape and peak of the epidemic curves can be pronounced, if sensitivity either degrades or remains poor in the 60-70% range. In the extreme scenario, hundreds of undiagnosed cases, even among the tested, are possible, potentially leading to further unchecked contagion should these cases not self-isolate.
The best way to better understand bias in the epidemic curves of COVID-19 due to errors in testing is to empirically evaluate misclassification of diagnosis in clinical settings and apply this knowledge to adjustment of epidemic curves.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The use of publicly available sequencing datasets as controls (hereafter, "public controls") in studies of rare variant disease associations has great promise but can increase the risk of ...false-positive discovery. The specific factors that could contribute to inflated distribution of test statistics have not been systematically examined. Here, we leveraged both public controls, gnomAD v2.1 and several datasets sequenced in our laboratory to systematically investigate factors that could contribute to the false-positive discovery, as measured by λΔ95, a measure to quantify the degree of inflation in statistical significance. Analyses of datasets in this investigation found that 1) the significantly inflated distribution of test statistics decreased substantially when the same variant caller and filtering pipelines were employed, 2) differences in library prep kits and sequencers did not affect the false-positive discovery rate and, 3) joint vs. separate variant-calling of cases and controls did not contribute to the inflation of test statistics. Currently available methods do not adequately adjust for the high false-positive discovery. These results, especially if replicated, emphasize the risks of using public controls for rare-variant association tests in which individual-level data and the computational pipeline are not readily accessible, which prevents the use of the same variant-calling and filtering pipelines on both cases and controls. A plausible solution exists with the emergence of cloud-based computing, which can make it possible to bring containerized analytical pipelines to the data (rather than the data to the pipeline) and could avert or minimize these issues. It is suggested that future reports account for this issue and provide this as a limitation in reporting new findings based on studies that cannot practically analyze all data on a single pipeline.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Early in the COVID-19 pandemic, routine sexually transmitted infection (STI) screenings decreased, and test positivity rates increased due to limited screening appointments, national-level STI ...testing supply shortages, and social distancing mandates. It is unclear if adolescent preventive STI screening has returned to pre-pandemic levels and if pre-existing disparities worsened in late-pandemic.
This cross-sectional study examined 22,974 primary care visits by 13–19-year-olds in the Philadelphia metropolitan area undergoing screening for gonorrhea and chlamydia in a 31-clinic pediatric primary care network during 2018–2022. Using interrupted-time-series analysis and logistic regression, pandemic-related changes in the asymptomatic STI screening rate and test positivity were tracked across patient demographics. Neighborhood moderation was investigated by census-tract-level Child Opportunity Index in 2023.
The asymptomatic STI screening rate dropped by 27.8 percentage points (pp) and 13.5pp when the pandemic and national STI test supply shortage began, respectively, but returned to pre-pandemic levels after supply availability was restored in early 2021. Non-Hispanic-Black adolescents had a significant pandemic drop in STI screening rate, and it did not return to prep-andemic levels (−3.6 pp in the late-pandemic period, p<0.01). This decrease was more pronounced in socioeconomically and educationally disadvantaged neighborhoods (7.5 pp and 9.9 pp lower, respectively) than in advantaged neighborhoods (both p<0.001), controlling for sex, age, insurance type and clinic characteristics.
Neighborhood socioeconomic and educational disadvantage amplified racial-ethnic disparities in STI screening during the pandemic. Future interventions should focus on improving primary care utilization of non-Hispanic-Black adolescents to increase routine STI screening and preventive care utilization.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP