This paper considers the maximum likelihood estimation of factor models of high dimension, where the number of variables (N) is comparable with or even greater than the number of observations (T). An ...inferential theory is developed. We establish not only consistency but also the rate of convergence and the limiting distributions. Five different sets of identification conditions are considered. We show that the distributions of the MLE estimators depend on the identification restrictions. Unlike the principal components approach, the maximum likelihood estimator explicitly allows heteroskedasticities, which are jointly estimated with other parameters. Efficiency of MLE relative to the principal components method is also considered.
An approximate factor model of high dimension has two key features. First, the idiosyncratic errors are correlated and heteroskedastic over both the cross-section and time dimensions; the ...correlations and heteroskedasticities are of unknown forms. Second, the number of variables is comparable or even greater than the sample size. Thus, a large number of parameters exist under a high-dimensional approximate factor model. Most widely used approaches to estimation are principal component based. This paper considers the maximum likelihood-based estimation of the model. Consistency, rate of convergence, and limiting distributions are obtained under various identification restrictions. Monte Carlo simulations show that the likelihood method is easy to implement and has good finite sample properties.
This paper considers generalized least squares (GLS) estimation for linear panel data models. By estimating the large error covariance matrix consistently, the proposed feasible GLS estimator is more ...efficient than the ordinary least squares in the presence of heteroskedasticity, serial and cross-sectional correlations. The covariance matrix used for the feasible GLS is estimated via the banding and thresholding method. We establish the limiting distribution of the proposed estimator. A Monte Carlo study is considered. The proposed method is applied to an empirical application.
We consider the situation when there is a large number of series, N, each with T observations, and each series has some predictive ability for some variable of interest. A methodology of growing ...interest is first to estimate common factors from the panel of data by the method of principal components and then to augment an otherwise standard regression with the estimated factors. In this paper, we show that the least squares estimates obtained from these factor-augmented regressions are$\sqrt{T}$consistent and asymptotically normal if$\sqrt{T}/N \rightarrow 0$. The conditional mean predicted by the estimated factors is min$\sqrt{T}$,$\sqrt{N}$ consistent and asymptotically normal. Except when T/N goes to zero, inference should take into account the effect of "estimated regressors" on the estimated conditional mean. We present analytical formulas for prediction intervals that are valid regardless of the magnitude of N/T and that can also be used when the factors are nonstationary.
This paper studies two refinements to the method of factor forecasting. First, we consider the method of quadratic principal components that allows the link function between the predictors and the ...factors to be non-linear. Second, the factors used in the forecasting equation are estimated in a way to take into account that the goal is to forecast a specific series. This is accomplished by applying the method of principal components to ‘targeted predictors’ selected using hard and soft thresholding rules. Our three main findings can be summarized as follows. First, we find improvements at all forecast horizons over the current diffusion index forecasts by estimating the factors using fewer but informative predictors. Allowing for non-linearity often leads to additional gains. Second, forecasting the volatile one month ahead inflation warrants a high degree of targeting to screen out the noisy predictors. A handful of variables, notably relating to housing starts and interest rates, are found to have systematic predictive power for inflation at all horizons. Third, the targeted predictors selected by both soft and hard thresholding changes with the forecast horizon and the sample period. Holding the set of predictors fixed as is the current practice of factor forecasting is unnecessarily restrictive.
This paper develops an inferential theory for factor models of large dimensions. The principal components estimator is considered because it is easy to compute and is asymptotically equivalent to the ...maximum likelihood estimator (if normality is assumed). We derive the rate of convergence and the limiting distributions of the estimated factors, factor loadings, and common components. The theory is developed within the framework of large cross sections (N) and a large time dimension (T), to which classical factor analysis does not apply. We show that the estimated common components are asymptotically normal with a convergence rate equal to the minimum of the square roots of N and T. The estimated factors and their loadings are generally normal, although not always so. The convergence rate of the estimated factors and factor loadings can be faster than that of the estimated common components. These results are obtained under general conditions that allow for correlations and heteroskedasticities in both dimensions. Stronger results are obtained when the idiosyncratic errors are serially uncorrelated and homoskedastic. A necessary and sufficient condition for consistency is derived for large N but fixed T.
A widely held but untested assumption underlying macroeconomic analysis is that the number of shocks driving economic fluctuations, q, is small. In this article we associate q with the number of ...dynamic factors in a large panel of data. We propose a methodology to determineq without having to estimate the dynamic factors. We first estimate a VAR in r static factors, where the factors are obtained by applying the method of principal components to a large panel of data, then compute the eigenvalues of the residual covariance or correlation matrix. We then test whether their eigenvalues satisfy an asymptotically shrinking bound that reflects sampling error. We apply the procedure to determine the number of primitive shocks in a large number of macroeconomic time series. An important aspect of the present analysis is to make precise the relationship between the dynamic factors and the static factors, which is a result of independent interest.
This paper develops a new methodology that makes use of the factor structure of large dimensional panels to understand the nature of nonstationarity in the data. We refer to it as PANIC-Panel ...Analysis of Nonstationarity in Idiosyncratic and Common components. PANIC can detect whether the nonstationarity in a series is pervasive, or variable-specific, or both. It can determine the number of independent stochastic trends driving the common factors. PANIC also permits valid pooling of individual statistics and thus panel tests can be constructed. A distinctive feature of PANIC is that it tests the unobserved components of the data instead of the observed series. The key to PANIC is consistent estimation of the space spanned by the unobserved common factors and the idiosyncratic errors without knowing a priori whether these are stationary or integrated processes. We provide a rigorous theory for estimation and inference and show that the tests have good finite sample properties.
This paper studies estimation of panel cointegration models with cross-sectional dependence generated by unobserved global stochastic trends. The standard least squares estimator is, in general, ...inconsistent owing to the spuriousness induced by the unobservable I(1) trends. We propose two iterative procedures that jointly estimate the slope parameters and the stochastic trends. The resulting estimators are referred to respectively as CupBC (continuously-updated and bias-corrected) and the CupFM (continuously-updated and fully-modified) estimators. We establish their consistency and derive their limiting distributions. Both are asymptotically unbiased and (mixed) normal and permit inference to be conducted using standard test statistics. The estimators are also valid when there are mixed stationary and non-stationary factors, as well as when the factors are all stationary.
Large factor models use a few latent factors to characterize the co-movement of economic variables in a high-dimensional data set. High dimensionality brings challenges as well as new insights into ...the advancement of econometric theory. Because of their ability to effectively summarize information in large data sets, factor models have been increasingly used in economics and finance. The factors, estimated from the high-dimensional data, can, for example, help improve forecasting, provide efficient instruments, control for nonlinear unobserved heterogeneity, and capture cross-sectional dependence. This article reviews the theory on estimation and statistical inference of large factor models. It also discusses important applications and highlights future directions.