Abstract
Machine learning is a branch of computer science that has the potential to transform epidemiologic sciences. Amid a growing focus on “Big Data,” it offers epidemiologists new tools to tackle ...problems for which classical methods are not well-suited. In order to critically evaluate the value of integrating machine learning algorithms and existing methods, however, it is essential to address language and technical barriers between the two fields that can make it difficult for epidemiologists to read and assess machine learning studies. Here, we provide an overview of the concepts and terminology used in machine learning literature, which encompasses a diverse set of tools with goals ranging from prediction to classification to clustering. We provide a brief introduction to 5 common machine learning algorithms and 4 ensemble-based approaches. We then summarize epidemiologic applications of machine learning techniques in the published literature. We recommend approaches to incorporate machine learning in epidemiologic research and discuss opportunities and challenges for integrating machine learning and existing epidemiologic research methods.
A novel human coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was identified in China in December 2019. There is limited support for many of its key epidemiologic features, ...including the incubation period for clinical disease (coronavirus disease 2019 COVID-19), which has important implications for surveillance and control activities.
To estimate the length of the incubation period of COVID-19 and describe its public health implications.
Pooled analysis of confirmed COVID-19 cases reported between 4 January 2020 and 24 February 2020.
News reports and press releases from 50 provinces, regions, and countries outside Wuhan, Hubei province, China.
Persons with confirmed SARS-CoV-2 infection outside Hubei province, China.
Patient demographic characteristics and dates and times of possible exposure, symptom onset, fever onset, and hospitalization.
There were 181 confirmed cases with identifiable exposure and symptom onset windows to estimate the incubation period of COVID-19. The median incubation period was estimated to be 5.1 days (95% CI, 4.5 to 5.8 days), and 97.5% of those who develop symptoms will do so within 11.5 days (CI, 8.2 to 15.6 days) of infection. These estimates imply that, under conservative assumptions, 101 out of every 10 000 cases (99th percentile, 482) will develop symptoms after 14 days of active monitoring or quarantine.
Publicly reported cases may overrepresent severe cases, the incubation period for which may differ from that of mild cases.
This work provides additional evidence for a median incubation period for COVID-19 of approximately 5 days, similar to SARS. Our results support current proposals for the length of quarantine or active monitoring of persons potentially exposed to SARS-CoV-2, although longer monitoring periods might be justified in extreme cases.
U.S. Centers for Disease Control and Prevention, National Institute of Allergy and Infectious Diseases, National Institute of General Medical Sciences, and Alexander von Humboldt Foundation.
First discovered in 1947, Zika virus (ZIKV) infection remained a little-known tropical disease until 2015, when its apparent association with a considerable increase in the incidence of microcephaly ...in Brazil raised alarms worldwide. There is limited information on the key factors that determine the extent of the global threat from ZIKV infection and resulting complications. Here, we review what is known about the epidemiology, natural history, and public health effects of ZIKV infection, the empirical basis for this knowledge, and the critical knowledge gaps that need to be filled.
Dengue hemorrhagic fever (DHF), a severe manifestation of dengue viral infection that can cause severe bleeding, organ impairment, and even death, affects between 15,000 and 105,000 people each year ...in Thailand. While all Thai provinces experience at least one DHF case most years, the distribution of cases shifts regionally from year to year. Accurately forecasting where DHF outbreaks occur before the dengue season could help public health officials prioritize public health activities. We develop statistical models that use biologically plausible covariates, observed by April each year, to forecast the cumulative DHF incidence for the remainder of the year. We perform cross-validation during the training phase (2000–2009) to select the covariates for these models. A parsimonious model based on preseason incidence outperforms the 10-y median for 65% of province-level annual forecasts, reduces the mean absolute error by 19%, and successfully forecasts outbreaks (area under the receiver operating characteristic curve = 0.84) over the testing period (2010–2014). We find that functions of past incidence contribute most strongly to model performance, whereas the importance of environmental covariates varies regionally. This work illustrates that accurate forecasts of dengue risk are possible in a policy-relevant timeframe.
We assessed the added value and limitations of generating directly estimated ZIP Code-level estimates by aggregating 5 years of data from an annual cross-sectional survey, the New York City Community ...Health Survey (
n
= 44,886) from 2009 to 2013, that were designed to provide reliable estimates only of larger geographies. Survey weights generated directly-observed ZIP Code (
n
= 128) level estimates. We assessed the heterogeneity of ZIP Code-level estimates within coarser United Hospital Fund (UHF) neighborhood areas (
n
= 34) by using the Rao-Scott Chi-Square test and one-way ANOVA. Orthogonal linear contrasts assessed whether there were linear trends at the UHF level from 2009 to 2013. 22 of 37 health indicators were reliable in over 50% of ZIP Codes. 14 of the 22 variables showed heterogeneity in ≥4 UHFs. Variables for drinking, nutrition, and HIV testing showed heterogeneity in the most UHFs (9–24 UHFs). In half of the 32 UHFs, >20% variables had within-UHF heterogeneity. Flu vaccination and sugary beverage consumption showed significant time trends in the largest number of UHFs (12 or more UHFs). Overall, heterogeneity of ZIP Code-level estimates suggests that there is value in aggregating 5 years of data to make direct small area estimates.
We investigated a multi-family cluster of 22 cases in Jixi, where pre-symptomatic and asymptomatic transmission resulted in at least 41% of household infections of SARS-CoV-2. Our study illustrates ...the challenge of controlling COVID-19 due to the presence of asymptomatic and pre-symptomatic transmission even when extensive testing and contact tracing are conducted.
Close interpersonal contact likely drives spatial clustering of cases of cholera and diarrhea, but spatial clustering of risk factors may also drive this pattern. Few studies have focused ...specifically on how exposures for disease cluster at small spatial scales. Improving our understanding of the micro-scale clustering of risk factors for cholera may help to target interventions and power studies with cluster designs. We selected sets of spatially matched households (matched-sets) near cholera case households between April and October 2013 in a cholera endemic urban neighborhood of Tongi Township in Bangladesh. We collected data on exposures to suspected cholera risk factors at the household and individual level. We used intra-class correlation coefficients (ICCs) to characterize clustering of exposures within matched-sets and households, and assessed if clustering depended on the geographical extent of the matched-sets. Clustering over larger spatial scales was explored by assessing the relationship between matched-sets. We also explored whether different exposures tended to appear together in individuals, households, and matched-sets. Household level exposures, including: drinking municipal supplied water (ICC = 0.97, 95%CI = 0.96, 0.98), type of latrine (ICC = 0.88, 95%CI = 0.71, 1.00), and intermittent access to drinking water (ICC = 0.96, 95%CI = 0.87, 1.00) exhibited strong clustering within matched-sets. As the geographic extent of matched-sets increased, the concordance of exposures within matched-sets decreased. Concordance between matched-sets of exposures related to water supply was elevated at distances of up to approximately 400 meters. Household level hygiene practices were correlated with infrastructure shown to increase cholera risk. Co-occurrence of different individual level exposures appeared to mostly reflect the differing domestic roles of study participants. Strong spatial clustering of exposures at a small spatial scale in a cholera endemic population suggests a possible role for highly targeted interventions. Studies with cluster designs in areas with strong spatial clustering of exposures should increase sample size to account for the correlation of these exposures.
Rapid spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Wuhan, China, prompted heightened surveillance in Shenzhen, China. The resulting data provide a rare opportunity to ...measure key metrics of disease course, transmission, and the impact of control measures.
From Jan 14 to Feb 12, 2020, the Shenzhen Center for Disease Control and Prevention identified 391 SARS-CoV-2 cases and 1286 close contacts. We compared cases identified through symptomatic surveillance and contact tracing, and estimated the time from symptom onset to confirmation, isolation, and admission to hospital. We estimated metrics of disease transmission and analysed factors influencing transmission risk.
Cases were older than the general population (mean age 45 years) and balanced between males (n=187) and females (n=204). 356 (91%) of 391 cases had mild or moderate clinical severity at initial assessment. As of Feb 22, 2020, three cases had died and 225 had recovered (median time to recovery 21 days; 95% CI 20–22). Cases were isolated on average 4·6 days (95% CI 4·1–5·0) after developing symptoms; contact tracing reduced this by 1·9 days (95% CI 1·1–2·7). Household contacts and those travelling with a case were at higher risk of infection (odds ratio 6·27 95% CI 1·49–26·33 for household contacts and 7·06 1·43–34·91 for those travelling with a case) than other close contacts. The household secondary attack rate was 11·2% (95% CI 9·1–13·8), and children were as likely to be infected as adults (infection rate 7·4% in children <10 years vs population average of 6·6%). The observed reproductive number (R) was 0·4 (95% CI 0·3–0·5), with a mean serial interval of 6·3 days (95% CI 5·2–7·6).
Our data on cases as well as their infected and uninfected close contacts provide key insights into the epidemiology of SARS-CoV-2. This analysis shows that isolation and contact tracing reduce the time during which cases are infectious in the community, thereby reducing the R. The overall impact of isolation and contact tracing, however, is uncertain and highly dependent on the number of asymptomatic cases. Moreover, children are at a similar risk of infection to the general population, although less likely to have severe symptoms; hence they should be considered in analyses of transmission and control.
Emergency Response Program of Harbin Institute of Technology, Emergency Response Program of Peng Cheng Laboratory, US Centers for Disease Control and Prevention.
Abstract
Background
Understanding the drivers of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission is crucial for control policies, but evidence of transmission rates in ...different settings remains limited.
Methods
We conducted a systematic review to estimate secondary attack rates (SARs) and observed reproduction numbers (Robs) in different settings exploring differences by age, symptom status, and duration of exposure. To account for additional study heterogeneity, we employed a beta-binomial model to pool SARs across studies and a negative-binomial model to estimate Robs.
Results
Households showed the highest transmission rates, with a pooled SAR of 21.1% (95% confidence interval CI:17.4–24.8). SARs were significantly higher where the duration of household exposure exceeded 5 days compared with exposure of ≤5 days. SARs related to contacts at social events with family and friends were higher than those for low-risk casual contacts (5.9% vs 1.2%). Estimates of SARs and Robs for asymptomatic index cases were approximately one-seventh, and for presymptomatic two-thirds of those for symptomatic index cases. We found some evidence for reduced transmission potential both from and to individuals younger than 20 years of age in the household context, which is more limited when examining all settings.
Conclusions
Our results suggest that exposure in settings with familiar contacts increases SARS-CoV-2 transmission potential. Additionally, the differences observed in transmissibility by index case symptom status and duration of exposure have important implications for control strategies, such as contact tracing, testing, and rapid isolation of cases. There were limited data to explore transmission patterns in workplaces, schools, and care homes, highlighting the need for further research in such settings.
Killed whole-cell oral cholera vaccines (kOCVs) are becoming a standard cholera control and prevention tool. However, vaccine efficacy and direct effectiveness estimates have varied, with differences ...in study design, location, follow-up duration, and vaccine composition posing challenges for public health decision making. We did a systematic review and meta-analysis to generate average estimates of kOCV efficacy and direct effectiveness from the available literature.
For this systematic review and meta-analysis, we searched PubMed, Embase, Scopus, and the Cochrane Review Library on July 9, 2016, and ISI Web of Science on July 11, 2016, for randomised controlled trials and observational studies that reported estimates of direct protection against medically attended confirmed cholera conferred by kOCVs. We included studies published on any date in English, Spanish, French, or Chinese. We extracted from the published reports the primary efficacy and effectiveness estimates from each study and also estimates according to number of vaccine doses, duration, and age group. The main study outcome was average efficacy and direct effectiveness of two kOCV doses, which we estimated with random-effect models. This study is registered with PROSPERO, number CRD42016048232.
Seven trials (with 695 patients with cholera) and six observational studies (217 patients with cholera) met the inclusion criteria, with an average two-dose efficacy of 58% (95% CI 42–69, I2=58%) and effectiveness of 76% (62–85, I2=0). Average two-dose efficacy in children younger than 5 years (30% 95% CI 15–42, I2=0%) was lower than in those 5 years or older (64% 58–70, I2=0%; p<0·0001). Two-dose efficacy estimates of kOCV were similar during the first 2 years after vaccination, with estimates of 56% (95% CI 42–66, I2=45%) in the first year and 59% (49–67, I2=0) in the second year. The efficacy reduced to 39% (13 to 57, I2=48%) in the third year, and 26% (−46 to 63, I2=74%) in the fourth year.
Two kOCV doses provide protection against cholera for at least 3 years. One kOCV dose provides at least short-term protection, which has important implications for outbreak management. kOCVs are effective tools for cholera control.
The Bill & Melinda Gates Foundation.