Recent Developments in Factor Models and Applications in Econometric Learning

Jianqing Fan; Kunpeng Li; Yuan Liao

doi:10.1146/annurev-financial-091420-011735

Annual Review of Financial Economics

Volume 13, 2021

Review Article

Free

Recent Developments in Factor Models and Applications in Econometric Learning

Jianqing Fan¹, Kunpeng Li², and Yuan Liao³
View Affiliations Hide Affiliations

Affiliations: ¹Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544, USA; email: [email protected] ²International School of Economics and Management, Capital University of Economics and Business, Beijing 100070, China ³Department of Economics, Rutgers University, New Brunswick, New Jersey 08901, USA
Vol. 13:401-430 (Volume publication date November 2021) https://doi.org/10.1146/annurev-financial-091420-011735
Copyright © 2021 by Annual Reviews. All rights reserved

Abstract

This article provides a selective overview of the recent developments in factor models and their applications in econometric learning. We focus on the perspective of the low-rank structure of factor models and particularly draw attention to estimating the model from the low-rank recovery point of view. Our survey mainly consists of three parts. The first part is a review of new factor estimations based on modern techniques for recovering low-rank structures of high-dimensional models. The second part discusses statistical inferences of several factor-augmented models and their applications in statistical learning models. The final part summarizes new developments dealing with unbalanced panels from the matrix completion perspective.

Keyword(s): factor adjustments, factor models, high-dimensional statistics, JEL C01, JEL C58, matrix completion, model selection, multiple testing, robustness, spiked low-rank matrix, unbalanced panel

Article metrics loading...

/content/journals/10.1146/annurev-financial-091420-011735

2021-11-01

2024-05-01

Full text loading...

/deliver/fulltext/financial/13/1/annurev-financial-091420-011735.html?itemId=/content/journals/10.1146/annurev-financial-091420-011735&mimeType=html&fmt=ahah

Literature Cited

Abbe E 2017. Community detection and stochastic block models: recent developments. J. Mach. Learn. Res. 18:16446–531
[Google Scholar]
Abbe E, Fan J, Wang K, Zhong Y 2020. Entrywise eigenvector analysis of random matrices with low expected rank. Ann. Stat. 48:1452–74
[Google Scholar]
Agarwal A, Negahban S, Wainwright MJ. 2012. Noisy matrix decomposition via convex relaxation: optimal rates in high dimensions. Ann. Stat. 40:21171–97
[Google Scholar]
Ahn S, Horenstein A. 2013. Eigenvalue ratio test for the number of factors. Econometrica 81:1203–27
[Google Scholar]
Aït-Sahalia Y, Xiu D. 2017. Using principal component analysis to estimate a high dimensional factor model with high-frequency data. J. Econom. 201:2384–99
[Google Scholar]
Antoniadis A, Fan J. 2001. Regularized wavelet approximations. J. Am. Stat. Assoc. 96:939–67
[Google Scholar]
Athey S, Bayati M, Doudchenko N, Imbens G, Khosravi K. 2018. Matrix completion methods for causal panel data models NBER Work. Pap 25132
Bai J. 2003. Inferential theory for factor models of large dimensions. Econometrica 71:135–71
[Google Scholar]
Bai J, Li K. 2012. Statistical analysis of factor models of high dimension. Ann. Stat. 40:1436–65
[Google Scholar]
Bai J, Li K. 2016. Maximum likelihood estimation and inference for approximate factor models of high dimension. Rev. Econ. Stat. 98:2298–309
[Google Scholar]
Bai J, Liao Y. 2016. Efficient estimation of approximate factor models via penalized maximum likelihood. J. Econom. 191:11–18
[Google Scholar]
Bai J, Ng S. 2002. Determining the number of factors in approximate factor models. Econometrica 70:1191–221
[Google Scholar]
Bai J, Ng S. 2006. Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions. Econometrica 74:41133–50
[Google Scholar]
Bai J, Ng S. 2019a. Rank regularized estimation of approximate factor models. J. Econom. 212:178–96
[Google Scholar]
Bai J, Ng S. 2019b. Matrix completion, counterfactuals, and factor analysis of missing data. arXiv:1910.06677 [econ.EM]
Bai J, Wang P. 2016. Econometric analysis of large factor models. Annu. Rev. Econ. 8:53–80
[Google Scholar]
Baltagi BH, Kao C, Wang F. 2017. Identification and estimation of a large factor model with structural instability. J. Econom. 197:187–100
[Google Scholar]
Barigozzi M, Cho H. 2018. Consistent estimation of high-dimensional factor models when the factor number is over-estimated. arXiv:1811.00306 [stat.ME]
Barigozzi M, Cho H, Fryzlewicz P. 2018. Simultaneous multiple change-point and factor analysis for high-dimensional time series. J. Econom. 206:1187–225
[Google Scholar]
Barigozzi M, Luciani M. 2019. Quasi maximum likelihood estimation and inference of large approximate dynamic factor models via the EM algorithm. arXiv:1910.03821 [math.ST]
Barras L, Scaillet O, Wermers R. 2010. False discoveries in mutual fund performance: measuring luck in estimated alphas. J. Finance 65:1179–216
[Google Scholar]
Belloni A, Chernozhukov V, Hansen C. 2014. Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81:2608–50
[Google Scholar]
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57:1289–300
[Google Scholar]
Brillinger DR. 1964. A frequency approach to the techniques of principal components, factor analysis and canonical variates in the case of stationary time series. Invited paper, Royal Statistical Society Conference, Cardiff Wales, UK: Sept. 29–Oct. 1. https://www.stat.berkeley.edu/∼brill/Papers/rss1964.pdf
Cai T, Cai TT, Zhang A. 2016. Structured matrix completion with applications to genomic data integration. J. Am. Stat. Assoc. 111:514621–33
[Google Scholar]
Cai T, Liu W. 2011. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 106:494672–84
[Google Scholar]
Candès EJ, Li X, Ma Y, Wright J. 2011. Robust principal component analysis?. J. Assoc. Comput. Mach. 58:31–37
[Google Scholar]
Catoni O. 2012. Challenging the empirical mean and empirical variance: a deviation study. Ann. l'IHP Probab. Stat. 48:1148–85
[Google Scholar]
Chan KS. 1993. Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model. Ann. Stat. 21:1520–33
[Google Scholar]
Chen D, Mykland PA, Zhang L. 2020. The five trolls under the bridge: principal component analysis with asynchronous and noisy high frequency data. J. Am. Stat. Assoc. 115:5321960–77
[Google Scholar]
Chen EY, Tsay RS, Chen R 2020. Constrained factor models for high-dimensional matrix-variate time series. J. Am. Stat. Assoc. 115:530775–93
[Google Scholar]
Chen Y, Chi Y, Fan J, Ma C, Yan Y. 2020a. Noisy matrix completion: understanding statistical guarantees for convex relaxation via nonconvex optimization. SIAM J. Optim. 30:43098–121
[Google Scholar]
Chen Y, Fan J, Ma C, Yan Y 2019. Inference and uncertainty quantification for noisy matrix completion. PNAS 116:4622931–37
[Google Scholar]
Chen Y, Fan J, Ma C, Yan Y. 2020b. Bridging convex and nonconvex optimization in robust PCA: noise, outliers, and missing data. arXiv:2001.05484 [stat.ML]
Cheng X, Liao Z, Schorfheide F. 2016. Shrinkage estimation of high-dimensional factor models with structural instabilities. Rev. Econ. Stud. 83:41511–43
[Google Scholar]
Chernozhukov V, Hansen CB, Liao Y, Zhu Y. 2019. Inference for heterogeneous effects using low-rank estimations. Work. Pap. CWP31/19, Cent. Microdata Methods Pract. London:
Chudik A, Pesaran MH, Tosetti E. 2011. Weak and strong cross-section dependence and estimation of large panels. Econom. J. 14:1C45–90
[Google Scholar]
Connor G, Linton O. 2007. Semiparametric estimation of a characteristic-based factor model of stock returns. J. Empir. Finance 14:694–717
[Google Scholar]
Connor G, Matthias H, Linton O 2012. Efficient semiparametric estimation of the Fama-French model and extensions. Econometrica 80:713–54
[Google Scholar]
Doz C, Giannone D, Reichlin L. 2011. A two-step estimator for large approximate dynamic factor models based on Kalman filtering. J. Econom. 164:1188–205
[Google Scholar]
Doz C, Giannone D, Reichlin L. 2012. A quasi-maximum likelihood approach for large, approximate dynamic factor models. Rev. Econ. Stat. 94:1014–24
[Google Scholar]
Fama EF, French KR. 2015. A five-factor asset pricing model. J. Financ. Econ. 116:11–22
[Google Scholar]
Fan J, Han X, Gu W. 2012. Estimating false discovery proportion under arbitrary covariance dependence. J. Am. Stat. Assoc. 107:4991019–35
[Google Scholar]
Fan J, Ke Y, Liao Y. 2021. Augmented factor models with applications to validating market risk factors and forecasting bond risk premia. J. Econom. 222:269–94
[Google Scholar]
Fan J, Ke Y, Sun Q, Zhou WX. 2019a. FarmTest: factor-adjusted robust multiple testing with approximate false discovery control. J. Am. Stat. Assoc. 114:1880–93
[Google Scholar]
Fan J, Ke Y, Wang K. 2020. Factor-adjusted regularized model selection. J. Econom. 216:47171–85
[Google Scholar]
Fan J, Kim D. 2019. Structured volatility matrix estimation for non-synchronized high-frequency financial data. J. Econom. 209:161–78
[Google Scholar]
Fan J, Li Q, Wang Y 2017. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. B 79:1247–65
[Google Scholar]
Fan J, Li R, Zhang CH, Zou H. 2020. Statistical Foundations of Data Science. Boca Raton, FL: CRC Press
Fan J, Liao Y. 2020. Learning latent factors from diversified projections and its applications to over-estimated and weak factors. SSRN Work. Pap. 3446097
Fan J, Liao Y, Mincheva M. 2013. Large covariance estimation by thresholding principal orthogonal complements (with discussion). J. R. Stat. Soc. B 75:603–80
[Google Scholar]
Fan J, Liao Y, Wang W. 2016. Projected principal component analysis in factor models. Ann. Stat. 44:1219–54
[Google Scholar]
Fan J, Liao Y, Yao J. 2015. Power enhancement in high-dimensional cross-sectional tests. Econometrica 83:1497–541
[Google Scholar]
Fan J, Lv J. 2008. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. B 70:5849–911
[Google Scholar]
Fan J, Wang D, Wang K, Zhu Z. 2019b. Distributed estimation of principal eigenspaces. Ann. Stat. 47:63009–31
[Google Scholar]
Fan J, Wang W, Zhong Y. 2018. An eigenvector perturbation bound and its application to robust covariance estimation. J. Mach. Learn. Res. 18:2071–42
[Google Scholar]
Fan J, Wang W, Zhong Y. 2019. Robust covariance estimation for approximate factor models. J. Econom. 208:15–22
[Google Scholar]
Fan J, Wang W, Zhu Z. 2021. A shrinkage principle for heavy-tailed data: high-dimensional robust low-rank matrix recovery. Ann. Stat. 49:31239–66
[Google Scholar]
Fan J, Xue L, Yao J. 2017. Sufficient forecasting using factor models. J. Econom. 201:2292–306
[Google Scholar]
Fan J, Zhong Y. 2018. Optimal subspace estimation using overidentifying vectors via generalized method of moments. arXiv:1805.02826 [stat.ME]
Forni M, Hallin M, Lippi M, Reichlin L. 2000. The generalized dynamic factor model: identification and estimation. Rev. Econ. Stat. 82:540–54
[Google Scholar]
Forni M, Hallin M, Lippi M, Reichlin L. 2005. The generalized dynamic factor model: one-sided estimation and forecasting. J. Am. Stat. Assoc. 100:471830–40
[Google Scholar]
Gagliardini P, Ossola E, Scaillet O. 2016. Time-varying risk premium in large cross-sectional equity data sets. Econometrica 84:3985–1046
[Google Scholar]
Gagliardini P, Ossola E, Scaillet O. 2019. Estimation of large dimensional conditional factor models in finance Res. Pap. 19–46 Swiss Finance Inst., Geneva
Giannone D, Reichlin L, Small D. 2008. Nowcasting: the real-time informational content of macroeconomic data. J. Monet. Econ. 55:4665–76
[Google Scholar]
Giglio S, Liao Y, Xiu D. 2021. Thousands of alpha tests. Rev. Financ. Stud. 34:73456–96
[Google Scholar]
Goncalves S, Perron B. 2020. Bootstrapping factor models with cross sectional dependence. J. Econom. 218:476–95
[Google Scholar]
Hansen BE. 2000. Sample splitting and threshold estimation. Econometrica 68:3575–603
[Google Scholar]
Hansen C, Liao Y. 2018. The factor-lasso and k-step bootstrap approach for inference in high-dimensional economic applications. Econom. Theory 35:465–509
[Google Scholar]
Harvey CR, Liu Y. 2018. False (and missed) discoveries in financial economics. Tech. Rep., Duke Univ. Durham, NC:
Harvey CR, Liu Y, Zhu H. 2015. … and the cross-section of expected returns. Rev. Financ. Stud. 29:15–68
[Google Scholar]
Imbens GW, Rubin DB. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. New York: Cambridge Univ. Press
Juodis A, Sarafidis V. 2020. A linear estimator for factor-augmented fixed-T panels with endogenous regressors. Tech. Rep., Dep. Econom. Bus. Stat., Monash Univ. Melbourne, Aust:.
Karabiyik H, Urbain JP, Westerlund J. 2019. CCE estimation of factor-augmented regression models with more factors than observables. J. Appl. Econom. 34:2268–84
[Google Scholar]
Ke ZT, Fan J, Wu Y 2015. Homogeneity pursuit. J. Am. Stat. Assoc. 110:509175–94
[Google Scholar]
Klopp O, Lounici K, Tsybakov AB. 2017. Robust matrix completion. Probab. Theory Relat. Fields 169:1–2523–64
[Google Scholar]
Koltchinskii V, Lounici K, Tsybakov AB. 2011. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Stat. 39:52302–29
[Google Scholar]
Lam C, Yao Q. 2012. Factor modeling for high-dimensional time series: inference for the number of factors. Ann. Stat. 40:2694–726
[Google Scholar]
Lawley D, Maxwell A. 1971. Factor Analysis as a Statistical Method London: Butterworths. , 2nd ed..
Lee S, Liao Y, Seo MH, Shin Y. 2021. Factor-driven two-regime regression. Ann. Stat 49:31656–78
[Google Scholar]
Li H, Li Q, Shi Y. 2017. Determining the number of factors when the number of factors can increase with sample size. J. Econom. 197:176–86
[Google Scholar]
Li J, Todorov V, Tauchen G. 2019. Jump factor models in large cross-sections. Quant. Econ. 10:2419–56
[Google Scholar]
Li KC. 1991. Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86:414316–27
[Google Scholar]
Liao Y, Yang X 2018. Uniform inference for characteristic effects of large continuous-time linear models. SSRN Work. Pap. 3069985
Ludvigson S, Ng S 2010. A factor analysis of bond risk premia. Handbook of Empirical Economics and Financeed. A Ulah, D Giles 313–72 Boca Raton, FL: CRC Press
[Google Scholar]
Ma S, Goldfarb D, Chen L 2011. Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128:1–2321–53
[Google Scholar]
Massacci D. 2017. Least squares estimation of large dimensional threshold factor models. J. Econom. 197:1101–29
[Google Scholar]
McCracken MW, Ng S. 2016. FRED-MD: a monthly database for macroeconomic research. J. Bus. Econ. Stat. 34:4574–89
[Google Scholar]
Moon HR, Weidner M. 2018. Nuclear norm regularized estimation of panel regression models. arXiv:1810.10987 [econ.EM]
Negahban S, Wainwright MJ. 2011. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Stat. 39:21069–97
[Google Scholar]
Onatski A. 2010. Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Stat. 92:41004–16
[Google Scholar]
Onatski A. 2012. Asymptotics of the principal components estimator of large factor models with weakly influential factors. J. Econom. 168:2244–58
[Google Scholar]
Pelger M. 2019. Large-dimensional factor modeling based on high-frequency observations. J. Econom. 208:123–42
[Google Scholar]
Romano JP, Shaikh AM, Wolf M. 2008. Control of the false discovery rate under dependence using the bootstrap and subsampling. TEST 17:3417
[Google Scholar]
Romano JP, Wolf M. 2007. Control of generalized error rates in multiple testing. Ann. Stat. 35:41378–408
[Google Scholar]
Schott JR. 1994. Determining the dimensionality in sliced inverse regression. J. Am. Stat. Assoc. 89:425141–48
[Google Scholar]
Seo MH, Linton O. 2007. A smoothed least squares estimator for threshold regression models. J. Econom. 141:2704–35
[Google Scholar]
Stock JH, Watson MW. 2002a. Forecasting using principal components from a large number of predictors. J. Am. Stat. Assoc. 97:1167–79
[Google Scholar]
Stock JH, Watson MW. 2002b. Macroeconomic forecasting using diffusion indexes. J. Bus. Econ. Stat. 20:2147–62
[Google Scholar]
Stock JH, Watson MW. 2016. Dynamic factor models, factor-augmented vector autoregressions, and structural vector autoregressions in macroeconomics. Handbook of Macroeconomics, Vol. 2A, eds. J Taylor, H Uhlig 415–525 Amsterdam: Elsevier
[Google Scholar]
Storey JD. 2002. A direct approach to false discovery rates. J. R. Stat. Soc. B 64:3479–98
[Google Scholar]
Su L, Miao K, Jin S 2019. On factor models with random missing: EM estimation, inference, and cross validation Work. Pap. 04-2019 Sch. Econ., Singapore Manag. Univ.
Su L, Wang X. 2017. On time-varying factor models: estimation and testing. J. Econom. 198:184–101
[Google Scholar]
Wang D, Liu X, Chen R 2019. Factor models for matrix-valued high-dimensional time series. J. Econom. 208:1231–48
[Google Scholar]
Wang S, Yang H, Yao C. 2019. On the penalized maximum likelihood estimation of high-dimensional approximate factor model. Comput. Stat. 34:2819–46
[Google Scholar]
Westerlund J, Urbain JP. 2013. On the estimation and inference in factor-augmented panel regressions with correlated loadings. Econ. Lett. 119:3247–50
[Google Scholar]
Xia D, Yuan M. 2019. Statistical inferences of linear forms for noisy matrix completion. arXiv:1909.00116 [math.ST]
Xiong R, Pelger M. 2019. Large dimensional latent factor modeling with missing observations and applications to causal inference. arXiv:1910.08273 [econ.EM]
Zhu Z, Wang T, Samworth RJ. 2019. High-dimensional principal component analysis with heterogeneous missingness. arXiv:1906.12125 [stat.ME]