Most questions in social and biomedical sciences are causal in nature: what would happen to individuals, or to groups, if part of their environment were changed? In this groundbreaking text, two ...world-renowned experts present statistical methods for studying such questions. This book starts with the notion of potential outcomes, each corresponding to the outcome that would be realized if a subject were exposed to a particular treatment or regime. In this approach, causal effects are comparisons of such potential outcomes. The fundamental problem of causal inference is that we can only observe one of the potential outcomes for a particular subject. The authors discuss how randomized experiments allow us to assess causal effects and then turn to observational studies. They lay out the assumptions needed for causal inference and describe the leading analysis methods, including matching, propensity-score methods, and instrumental variables. Many detailed applications are included, with special focus on practical aspects for the empirical researcher. This book is divided into seven parts. The table of contents presents them as follows: Part I, Introduction, presents the following chapters: (1) Causality, The Basic Framework; (2) A Brief History of the Potential Outcomes Approach to Causal Inference;and (3) A Classification of Assignment Mechanisms. Part II, Classical Randomized Experiments, contains: (4) A Taxonomy of Classical Randomized Experiments; (5) Fisher's Exact P-Values for Completely Randomized Experiments; (6) Neyman's Repeated Sampling Approach to Completely Randomized Experiments; (7) Regression Methods for Completely Randomized Experiments; (8) Model- Based Inference for Completely Randomized Experiments; (9) Stratified Randomized Experiments; (10) Pairwise Randomized Experiments; and (11) Case Study: An Experimental Evaluation of a Labor Market Program. Part III, Regular Assignment Mechanisms: Design, contains: (12) Unconfounded Treatment Assignment; (13) Estimating the Propensity Score; (14) Assessing Overlap in Covariate Distributions; (15) Matching to Improve Balance in Covariate Distributions; and (16) Trimming to Improve Balance in Covariate Distributions. Part IV, Regular Assignment Mechanisms: Analysis, contains: (17) Subclassification on the Propensity Score; (18) Matching Estimators; (19) A General Method for Estimating Sampling Variances for Standard Estimators for Average Causal Effects; and (20) Inference for General Causal Estimands. Part V, Regular Assignment Mechanisms: Supplementary Analysis, contains: (21) Assessing Unconfoundness; and (22) Sensitivity Analysis and Bounds. Part VI, Regular Assignment with NonCompliance: Analysis, contains (23) Instrumental Variables Analysis of Randomized Experiments with One-Sided Noncompliance; (24) Instrumental Variables Analysis of Randomized Experiments with Two-Sided Noncompliance; and (25) Model-Based Analysis in Instrumental Variable Settings: Randomized Experiments with Two-Sided Noncompliance. Part VII, Conclusion, contains: (26) Conclusions and Extensions. References and an Index are also provided.
In this essay I discuss potential outcome and graphical approaches to causality, and their relevance for empirical work in economics. I review some of the work on directed acyclic graphs, including ...the recent The Book of Why (Pearl and Mackenzie 2018). I also discuss the potential outcome framework developed by Rubin and coauthors (e.g., Rubin 2006), building on work by Neyman (1990 1923). I then discuss the relative merits of these approaches for empirical work in economics, focusing on the questions each framework answers well, and why much of the the work in economics is closer in spirit to the potential outcome perspective.
There is a large theoretical literature on methods for estimating causal effects under unconfoundedness, exogeneity, or selection-on-observables type assumptions using matching or propensity score ...methods. Much of this literature is highly technical and has not made inroads into empirical practice where many researchers continue to use simple methods such as ordinary least squares regression even in settings where those methods do not have attractive properties. In this paper, I discuss some of the lessons for practice from the theoretical literature and provide detailed recommendations on what to do. I illustrate the recommendations with three detailed applications.
The use of statistical significance and p-values has become a matter of substantial controversy in various fields using statistical methods. This has gone as far as some journals banning the use of ...indicators for statistical significance, or even any reports of p-values, and, in one case, any mention of confidence intervals. I discuss three of the issues that have led to these often-heated debates. First, I argue that in many cases, p-values and indicators of statistical significance do not answer the questions of primary interest. Such questions typically involve making (recommendations on) decisions under uncertainty. In that case, point estimates and measures of uncertainty in the form of confidence intervals or even better, Bayesian intervals, are often more informative summary statistics. In fact, in that case, the presence or absence of statistical significance is essentially irrelevant, and including them in the discussion may confuse the matter at hand. Second, I argue that there are also cases where testing null hypotheses is a natural goal and where p-values are reasonable and appropriate summary statistics. I conclude that banning them in general is counterproductive. Third, I discuss that the overemphasis in empirical work on statistical significance has led to abuse of p-values in the form of p-hacking and publication bias. The use of pre-analysis plans and replication studies, in combination with lowering the emphasis on statistical significance may help address these problems.
In this paper we study estimation of and inference for average treatment effects in a setting with panel data. We focus on the staggered adoption setting where units, e.g, individuals, firms, or ...states, adopt the policy or treatment of interest at a particular point in time, and then remain exposed to this treatment at all times afterwards. We take a design perspective where we investigate the properties of estimators and procedures given assumptions on the assignment process. We show that under random assignment of the adoption date the standard Difference-In-Differences (DID) estimator is an unbiased estimator of a particular weighted average causal effect. We characterize the exact finite sample properties of this estimand, and show that the standard variance estimator is conservative.
In this paper, we discuss recent developments in econometrics that we view as important for empirical researchers working on policy evaluation questions. We focus on three main areas, in each case, ...highlighting recommendations for applied work. First, we discuss new research on identification strategies in program evaluation, with particular focus on synthetic control methods, regression discontinuity, external validity, and the causal interpretation of regression methods. Second, we discuss various forms of supplementary analyses, including placebo analyses as well as sensitivity and robustness analyses, intended to make the identification strategies more credible. Third, we discuss some implications of recent advances in machine learning methods for causal effects, including methods to adjust for differences between treated and control units in high-dimensional settings, and methods for identifying and estimating heterogenous treatment effects.
In this paper we propose methods for estimating heterogeneity in causal effects in experimental and observational studies and for conducting hypothesis tests about the magnitude of differences in ...treatment effects across subsets of the population. We provide a data-driven approach to partition the data into subpopulations that differ in the magnitude of their treatment effects. The approach enables the construction of valid confidence intervals for treatment effects, even with many covariates relative to the sample size, and without “sparsity” assumptions.We propose an “honest” approach to estimation, whereby one sample is used to construct the partition and another to estimate treatment effects for each subpopulation. Our approach builds on regression tree methods, modified to optimize for goodness of fit in treatment effects and to account for honest estimation. Our model selection criterion anticipates that bias will be eliminated by honest estimation and also accounts for the effect of making additional splits on the variance of treatment effect estimates within each subpopulation. We address the challenge that the “ground truth” for a causal effect is not observed for any individual unit, so that standard approaches to cross-validation must be modified. Through a simulation study, we show that for our preferred method honest estimation results in nominal coverage for 90% confidence intervals, whereas coverage ranges between 74% and 84% for nonhonest approaches. Honest estimation requires estimating the model with a smaller sample size; the cost in terms of mean squared error of treatment effects for our preferred method ranges between 7–22%.
In Abadie and Imbens (2006), it was shown that simple nearest-neighbor matching estimators include a conditional bias term that converges to zero at a rate that may be slower than N
1/2
. As a ...result, matching estimators are not N
1/2
-consistent in general. In this article, we propose a bias correction that renders matching estimators N
1/2
-consistent and asymptotically normal. To demonstrate the methods proposed in this article, we apply them to the National Supported Work (NSW) data, originally analyzed in Lalonde (1986). We also carry out a small simulation study based on the NSW example. In this simulation study, a simple implementation of the bias-corrected matching estimator performs well compared to both simple matching estimators and to regression estimators in terms of bias, root-mean-squared-error, and coverage rates. Software to compute the estimators proposed in this article is available on the authors' web pages (
http://www.economics.harvard.edu/faculty/imbens/software.html
) and documented in Abadie et al. (2003).
Abstract
Clustered standard errors, with clusters defined by factors such as geography, are widespread in empirical research in economics and many other disciplines. Formally, clustered standard ...errors adjust for the correlations induced by sampling the outcome variable from a data-generating process with unobserved cluster-level components. However, the standard econometric framework for clustering leaves important questions unanswered: (i) Why do we adjust standard errors for clustering in some ways but not others, for example, by state but not by gender, and in observational studies but not in completely randomized experiments? (ii) Is the clustered variance estimator valid if we observe a large fraction of the clusters in the population? (iii) In what settings does the choice of whether and how to cluster make a difference? We address these and other questions using a novel framework for clustered inference on average treatment effects. In addition to the common sampling component, the new framework incorporates a design component that accounts for the variability induced on the estimator by the treatment assignment mechanism. We show that, when the number of clusters in the sample is a nonnegligible fraction of the number of clusters in the population, conventional clustered standard errors can be severely inflated, and propose new variance estimators that correct for this bias.
Approximate residual balancing Athey, Susan; Imbens, Guido W.; Wager, Stefan
Journal of the Royal Statistical Society. Series B, Statistical methodology,
September 2018, Letnik:
80, Številka:
4
Journal Article
Recenzirano
Odprti dostop
There are many settings where researchers are interested in estimating average treatment effects and are willing to rely on the unconfoundedness assumption, which requires that the treatment ...assignment be as good as random conditional on pretreatment variables. The unconfoundedness assumption is often more plausible if a large number of pretreatment variables are included in the analysis, but this can worsen the performance of standard approaches to treatment effect estimation. We develop a method for debiasing penalized regression adjustments to allow sparse regression methods like the lasso to be used for √n-consistent inference of average treatment effects in high dimensional linear models. Given linearity, we do not need to assume that the treatment propensities are estimable, or that the average treatment effect is a sparse contrast of the outcome model parameters. Rather, in addition to standard assumptions used to make lasso regression on the outcome model consistent under 1-norm error, we require only overlap, i.e. that the propensity score be uniformly bounded away from 0 and 1. Procedurally, our method combines balancing weights with a regularized regression adjustment.