The cluster robust variance estimator (CRVE) relies on the number of clusters being sufficiently large. Monte Carlo evidence suggests that the ‘rule of 42’ is not true for unbalanced clusters. ...Rejection frequencies are higher for datasets with 50 clusters proportional to US state populations than with 50 balanced clusters. Using critical values based on the wild cluster bootstrap performs much better. However, this procedure fails when a small number of clusters is treated. We explain why CRVE t statistics and the wild bootstrap fail in this case, study the ‘effective number’ of clusters and simulate placebo laws with dummy variable regressors.
Using large samples in econometrics MacKinnon, James G.
Journal of econometrics,
August 2023, 2023-08-00, Letnik:
235, Številka:
2
Journal Article
Recenzirano
Odprti dostop
As I demonstrate using evidence from a journal data repository that I manage, the datasets used in empirical work are getting larger. When we use very large datasets, it can be dangerous to rely on ...standard methods for statistical inference. In addition, we need to worry about computational issues. We must be careful in our choice of statistical methods and the algorithms used to implement them.
Inference based on cluster-robust standard errors in linear regression models, using either the Student's t-distribution or the wild cluster bootstrap, is known to fail when the number of treated ...clusters is very small. We propose a family of new procedures called the subcluster wild bootstrap, which includes the ordinary wild bootstrap as a limiting case. In the case of pure treatment models, where all observations within clusters are either treated or not, the latter procedure can work remarkably well. The key requirement is that all cluster sizes, regardless of treatment, should be similar. Unfortunately, the analogue of this requirement is not likely to hold for difference-in-differences regressions. Our theoretical results are supported by extensive simulations and an empirical example.
Methods for cluster-robust inference are routinely used in economics and many other disciplines. However, it is only recently that theoretical foundations for the use of these methods in many ...empirically relevant situations have been developed. In this paper, we use these theoretical results to provide a guide to empirical practice. We do not attempt to present a comprehensive survey of the (very large) literature. Instead, we bridge theory and practice by providing a thorough guide on what to do and why, based on recently available econometric theory and simulation evidence. To practice what we preach, we include an empirical analysis of the effects of the minimum wage on labor supply of teenagers using individual data.
Inference using difference-in-differences with clustered data requires care. Previous research has shown that, when there are few treated clusters, t-tests based on cluster-robust variance estimators ...(CRVEs) severely overreject, and different variants of the wild cluster bootstrap can either overreject or underreject dramatically. We study two randomization inference (RI) procedures. A procedure based on estimated coefficients may be unreliable when clusters are heterogeneous. A procedure based on t-statistics typically performs better (although by no means perfectly) under the null, but at the cost of some power loss. An empirical example demonstrates that RI procedures can yield inferences that differ dramatically from those of other methods.
In many fields of economics, and also in other disciplines, it is hard to justify the assumption that the random error terms in regression models are uncorrelated. It seems more plausible to assume ...that they are correlated within clusters, such as geographical areas or time periods, but uncorrelated across clusters. It has therefore become very popular to use “clustered” standard errors, which are robust against arbitrary patterns of within-cluster variation and covariation. Conventional methods for inference using clustered standard errors work very well when the model is correct and the data satisfy certain conditions, but they can produce very misleading results in other cases. This paper discusses some of the issues that users of these methods need to be aware of.
Dans de nombreuses branches de l’économie, mais également dans d’autres disciplines, il apparaît difficile de justifier l’hypothèse selon laquelle les termes d’erreurs aléatoires des modèles de régression sont indépendants. Il semble plus vraisemblable de considérer qu’ils sont corrélés à l’intérieur des échantillonnages, notamment dans le cadre d’aires géographiques ou de périodes de temps, mais qu’ils sont indépendants en dehors. Il est donc devenu très courant d’utiliser les écarts-types d’échantillonnage, lesquels s’avérant robustes contre les modèles arbitraires de variations et de covariations à l’intérieur de ces mêmes échantillons. Les méthodes habituelles utilisant les écarts-types d’échantillonnage pour évaluer l’inférence fonctionnent parfaitement lorsque le modèle est cohérent et que les données satisfont à certaines conditions. Néanmoins, dans d’autres circonstances, ces dernières peuvent aboutir à des résultats trompeurs. Cet article étudie certains des écueils que peuvent rencontrer les utilisateurs de cette méthode, et contre lesquels ils doivent se prémunir.
Across the tropics, smallholder farmers already face numerous risks to agricultural production. Climate change is expected to disproportionately affect smallholder farmers and make their livelihoods ...even more precarious; however, there is limited information on their overall vulnerability and adaptation needs. We conducted surveys of 600 households in Madagascar to characterize the vulnerability of smallholder farmers, identify how farmers cope with risks and explore what strategies are needed to help them adapt to climate change. Malagasy farmers are particularly vulnerable to any shocks to their agricultural system owing to their high dependence on agriculture for their livelihoods, chronic food insecurity, physical isolation and lack of access to formal safety nets. Farmers are frequently exposed to pest and disease outbreaks and extreme weather events (particularly cyclones), which cause significant crop and income losses and exacerbate food insecurity. Although farmers use a variety of risk-coping strategies, these are insufficient to prevent them from remaining food insecure. Few farmers have adjusted their farming strategies in response to climate change, owing to limited resources and capacity. Urgent technical, financial and institutional support is needed to improve the agricultural production and food security of Malagasy farmers and make their livelihoods resilient to climate change.
We study inference based on cluster-robust variance estimators for regression models with clustered errors, focusing on the wild cluster bootstrap. We state conditions under which asymptotic and ...bootstrap tests and confidence intervals are asymptotically valid. These conditions put limits on the rates at which the cluster sizes can increase as the number of clusters tends to infinity. We also derive Edgeworth expansions for the asymptotic and bootstrap test statistics. Simulation experiments illustrate the theoretical results and suggest that alternative variants of the wild cluster bootstrap may perform quite differently. The Edgeworth expansions explain the overrejection of asymptotic tests and shed light on the choice of auxiliary distribution and whether to use restricted or unrestricted estimates in the bootstrap data-generating process.
We study two cluster-robust variance estimators (CRVEs) for regression models with clustering in two dimensions and give conditions under which t-statistics based on each of them yield asymptotically ...valid inferences. In particular, one of the CRVEs requires stronger assumptions about the nature of the intra-cluster correlations. We then propose several wild bootstrap procedures and state conditions under which they are asymptotically valid for each type of t-statistic. Extensive simulations suggest that using certain bootstrap procedures with one of the t-statistics generally performs very well. An empirical example confirms that bootstrap inferences can differ substantially from conventional ones.
The wild bootstrap was originally developed for regression models with heteroskedasticity of unknown form. Over the past 30 years, it has been extended to models estimated by instrumental variables ...and maximum likelihood and to ones where the error terms are (perhaps multiway) clustered. Like bootstrap methods in general, the wild bootstrap is especially useful when conventional inference methods are unreliable because large-sample assumptions do not hold. For example, there may be few clusters, few treated clusters, or weak instruments. The package boottest can perform a wide variety of wild bootstrap tests, often at remarkable speed. It can also invert these tests to construct confidence sets. As a postestimation command, boottest works after linear estimation commands, including regress, cnsreg, ivregress, ivreg2, areg, and reghdfe, as well as many estimation commands based on maximum likelihood. Although it is designed to perform the wild cluster bootstrap, boottest can also perform the ordinary (nonclustered) version. Wrappers offer classical Wald, score/Lagrange multiplier, and Anderson–Rubin tests, optionally with (multiway) clustering. We review the main ideas of the wild cluster bootstrap, offer tips for use, explain why it is particularly amenable to computational optimization, state the syntax of boottest, artest, scoretest, and waldtest, and present several empirical examples.