Increasingly, the statistical and epidemiologic literature is focusing beyond issues of internal validity and turning its attention to questions of external validity. Here, we discuss some of the ...challenges of transporting a causal effect from a randomized trial to a specific target population. We present an inverse odds weighting approach that can easily operationalize transportability. We derive these weights in closed form and illustrate their use with a simple numerical example. We discuss how the conditions required for the identification of internally valid causal effects are translated to apply to the identification of externally valid causal effects. Estimating effects in target populations is an important goal, especially for policy or clinical decisions. Researchers and policy-makers should therefore consider use of statistical techniques such as inverse odds of sampling weights, which under careful assumptions can transport effect estimates from study samples to target populations.
Abstract Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives ...to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees CART), and meta-classifiers (in particular, boosting). Conclusion Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice.
Epidemiology in the Era of Big Data Mooney, Stephen J; Westreich, Daniel J; El-Sayed, Abdulrahman M
Epidemiology (Cambridge, Mass.),
05/2015, Volume:
26, Issue:
3
Journal Article
Peer reviewed
Big Data has increasingly been promoted as a revolutionary development in the future of science, including epidemiology. However, the definition and implications of Big Data for epidemiology remain ...unclear. We here provide a working definition of Big Data predicated on the so-called ‘3 Vs’: variety, volume, and velocity. From this definition, we argue that Big Data has evolutionary and revolutionary implications for identifying and intervening on the determinants of population health. We suggest that as more sources of diverse data become publicly available, the ability to combine and refine these data to yield valid answers to epidemiologic questions will be invaluable. We conclude that, while epidemiology as practiced today will continue to be practiced in the Big Data future, a component of our field’s future value lies in integrating subject matter knowledge with increased technical savvy. Our training programs and our visions for future public health interventions should reflect this future.
Machine learning is gaining prominence in the health sciences, where much of its use has focused on data-driven prediction. However, machine learning can also be embedded within causal analyses, ...potentially reducing biases arising from model misspecification. Using a question-and-answer format, we provide an introduction and orientation for epidemiologists interested in using machine learning but concerned about potential bias or loss of rigor due to use of “black box” models. We conclude with sample software code that may lower the barrier to entry to using these techniques.
Although Berkson's bias is widely recognized in the epidemiologic literature, it remains underappreciated as a model of both selection bias and bias due to missing data. Simple causal diagrams and 2 ...× 2 tables illustrate how Berkson's bias connects to collider bias and selection bias more generally, and show the strong analogies between Berksonian selection bias and bias due to missing data. In some situations, considerations of whether data are missing at random or missing not at random are less important than the causal structure of the missing data process. Although dealing with missing data always relies on strong assumptions about unobserved variables, the intuitions built with simple examples can provide a better understanding of approaches to missing data in real-world situations.
Women and HIV in the United States Breskin, Alexander; Adimora, Adaora A; Westreich, Daniel
PloS one,
02/2017, Volume:
12, Issue:
2
Journal Article
Peer reviewed
Open access
The demographic and geographic characteristics of the HIV epidemic in the US has changed substantially since the disease emerged, with women in the South experiencing a particularly high HIV ...incidence. In this study, we identified and described counties in the US in which the prevalence of HIV is particularly high in women compared to men.
Using data from AIDSVu, a public dataset of HIV cases in the US in 2012, we categorized counties by their decile of the ratio of female to male HIV prevalence. The demographic and socioeconomic characteristics of counties in the highest decile were compared to those of counties in the lower deciles.
Most of the counties in the highest decile were located in the Deep South. These counties had a lower median income, higher percentage of people in poverty, and lower percentage of people with a high school education. Additionally, people with HIV in these counties were more likely to be non-Hispanic black.
Counties with the highest ratios of female-to-male HIV prevalence are concentrated in the Southern US, and residents of these counties tend to be of lower socioeconomic status. Identifying and describing these counties is important for developing public health interventions.
In the absence of strong assumptions (e.g., exchangeability), only bounds for causal effects can be identified. Here we describe bounds for the risk difference for an effect of a binary exposure on a ...binary outcome in 4 common study settings: observational studies and randomized studies, each with and without simple random selection from the target population. Through these scenarios, we introduce randomizations for selection and treatment, and the widths of the bounds are narrowed from 2 (the width of the range of the risk difference) to 0 (point identification). We then assess the strength of the assumptions of exchangeability for internal and external validity by comparing their contributions to the widths of the bounds in the setting of an observational study without random selection from the target population. We find that when less than two-thirds of the target population is selected into the study, the assumption of exchangeability for external validity of the risk difference is stronger than that for internal validity. The relative strength of these assumptions should be considered when designing, analyzing, and interpreting observational studies and will aid in determining the best methods for estimating the causal effects of interest.
Abstract Whether use of various types of hormonal contraception (HC) affect risk of HIV acquisition is a critical question for women's health. For this systematic review, we identified 22 studies ...published by January 15, 2014 which met inclusion criteria; we classified thirteen studies as having severe methodological limitations, and nine studies as “informative but with important limitations”. Overall, data do not support an association between use of oral contraceptives and increased risk of HIV acquisition. Uncertainty persists regarding whether an association exists between depot-medroxyprogesterone acetate (DMPA) use and risk of HIV acquisition. Most studies suggested no significantly increased HIV risk with norethisterone enanthate (NET-EN) use, but when assessed in the same study, point estimates for NET-EN tended to be larger than for DMPA, though 95% confidence intervals overlapped substantially. No data have suggested significantly increased risk of HIV acquisition with use of implants, though data were limited. No data are available on the relationship between use of contraceptive patches, rings, or hormonal intrauterine devices and risk of HIV acquisition. Women choosing progestin-only injectable contraceptives such as DMPA or NET-EN should be informed of the current uncertainty regarding whether use of these methods increases risk of HIV acquisition, and like all women at risk of HIV, should be empowered to access and use condoms and other HIV preventative measures. Programs, practitioners, and women urgently need guidance on how to maximize health with respect to avoiding both unintended pregnancy and HIV given inconclusive or limited data for certain HC methods.