Abstract
We consider model averaging estimation problem in the linear regression model with missing response data, that allows for model misspecification. Based on the ‘complete’ data set for the response variable after inverse propensity score weighted imputation, we construct a leave-one-out cross-validation criterion for allocating model weights, where the propensity score model is estimated by the covariate balancing propensity score method. We derive some theoretical results to justify the proposed strategy. Firstly, when all candidate outcome regression models are misspecified, our procedures are proved to achieve optimality in terms of asymptotically minimizing the squared loss. Secondly, when the true outcome regression model is among the set of candidate models, the resulting model averaging estimators of the regression parameters are shown to be root-n consistent. Simulation studies provide evidence of the superiority of our methods over other existing model averaging methods, even when the propensity score model is misspecified. As an illustration, the approach is further applied to study the CD4 data.
Similar content being viewed by others
References
Akaike, H. (1973). Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika, 60, 255–265.
Ando, T., & Li, K. C. (2014). A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association, 109, 254–265.
Ando, T., & Li, K. C. (2017). A weight-relaxed model averaging approach for high-dimensional generalized linear models. The Annals of Statistics, 45, 2654–2679.
Buckland, S. T., Burnham, K. P., & Augustin, N. H. (1997). Model selection: an integral part of inference. Biometrics, 53, 603–618.
Chen, J., & Shao, J. (2000). Nearest neighbor imputation for survey data. Journal of Official Statistics, 16, 113–131.
Cheng, P. E. (1994). Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association, 89, 81–87.
Claeskens, G., Croux, C., & van Kerckhoven, J. (2006). Variable selection for logistic regression using a prediction-focused information criterion. Biometrics, 62, 972–979.
Dardanoni, V., Modica, S., & Peracchi, F. (2011). Regression with imputed covariates: A generalized missing-indicator approach. Journal of Econometrics, 162, 362–368.
Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business and Economic Statistics, 13, 253–263.
Ding, X., Xie, J., & Yan, X. (2021). Model averaging for multiple quantile regression with covariates missing at random. Journal of Statistical Computation and Simulation, 91, 2249–2275.
Fan, J., & Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association, 99, 710–723.
Fang, F., Lan, W., Tong, J., & Shao, J. (2019). Model averaging for prediction with fragmentary data. Journal of Business and Economic Statistics, 37, 517–527.
Gao, Y., Zhang, X., Wang, S., & Zou, G. (2016). Model averaging based on leave-subject-out cross-validation. Journal of Econometrics, 192, 139–151.
Guo, D., Xue, L., & Hu, Y. (2017). Covariate-balancing-propensity-score-based inference for linear models with missing responses. Statistics and Probability Letters, 123, 139–145.
Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75, 1175–1189.
Hansen, B. E. (2008). Least squares forecast averaging. Journal of Econometrics, 146, 342–350.
Hansen, B. E. (2014). Model averaging, asymptotic risk, and regressor groups. Quantitative Economics, 5, 495–530.
Hansen, B. E., & Racine, J. S. (2012). Jackknife model averaging. Journal of Econometrics, 167, 38–46.
Hjort, N. L., & Claeskens, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879–899.
Huang, J. Z., Wu, C. O., & Zhou, L. (2002). Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika, 89, 111–128.
Imai, K., & Ratkovic, M. (2014). Covariate balancing propensity score. Journal of the Royal Statistical Society: Series B, 76, 243–263.
Kang, J., & Schafer, J. L. (2007). Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 574–580.
King, G., Honaker, J., Joseph, A., & Scheve, K. (2001). Analyzing incomplete political science data: an alternative algorithm for multiple imputation. American Political Science Review, 95, 49–69.
Li, K. C. (1987). Asymptotic optimality for \(C_p\), \(C_L\), cross-validation and generalized cross-validation: discrete index set. The Annals of Statistics, 15, 958–975.
Liang, H., Wang, S., & Carroll, R. J. (2007). Partially linear models with missing response variables and error-prone covariates. Biometrika, 94, 185–198.
Liang, H., Wang, S., Robins, J. M., & Carroll, R. J. (2004). Estimation in partially linear models with missing covariates. Journal of the American Statistical Association, 99, 357–367.
Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.). Hoboken, NJ: Wiley.
Liu, Q., & Okui, R. (2013). Heteroscedasticity-robust \(C_p\) model averaging. The Econometrics Journal, 16, 463–472.
Liu, Q., & Zheng, M. (2020). Model averaging for generalized linear model with covariates that are missing completely at random. The Journal of Quantitative Economics, 11, 25–40.
Lu, X., & Su, L. (2015). Jackknife model averaging for quantile regressions. Journal of Econometrics, 188, 40–58.
Mallows, C. L. (1973). Some comments on \(C_{p}\). Technometrics, 15, 661–675.
Newey, W. K., & McFadden, D. (1994). Large sample estimation and hypothesis testing. In R. F. Engle & D. L. McFadden (Eds.), Handbook of Econometrics (Vol. IV, pp. 2111–2245). Amsterdam: North-Holland.
Qin, Y., & Lei, Q. (2010). On empirical likelihood for linear models with missing responses. Journal of Statistical Planning and Inference, 140, 3399–3408.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.
Scharfstein, D. O., & Robins, R. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94, 1096–1120.
Schomaker, M., Wan, A. T. K., & Heumann, C. (2010). Frequentist model averaging with missing observations. Computational Statistics and Data Analysis, 54, 3336–3347.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Sun, Z., Su, Z., & Ma, J. (2014). Focused vector information criterion model selection and model averaging regression with missing response. Metrika, 77, 415–432.
Sun, Z., Wang, Q., & Dai, P. (2009). Model checking for partially linear models with missing responses at random. Journal of Multivariate Analysis, 100, 636–651.
Wan, A. T. K., Zhang, X., & Zou, G. (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics, 156, 277–283.
Wang, Q., Linton, O., & Härdle, W. (2004). Semiparametric regression analysis with missing response at random. Journal of the American Statistical Association, 99, 334–345.
Wang, Q., & Rao, J. N. K. (2002). Empirical likelihood-based inference in linear models with missing data. Scandinavian Journal of Statistics, 29, 563–576.
Wei, Y., & Wang, Q. (2021). Cross-validation-based model averaging in linear models with response missing at random. Statistics and Probability Letters, 171, 108990.
Wei, Y., Wang, Q., & Liu, W. (2021). Model averaging for linear models with responses missing at random. Annals of the Institute of Statistical Mathematics, 73, 535–553.
Whittle, P. (1960). Bounds for the moments of linear and quadratic forms in independent variables. Theory of Probability and Its Applications, 5, 302–305.
Xie, J., Yan, X., & Tang, N. (2021). A model-averaging method for high-dimensional regression with missing responses at random. Statistica Sinica, 31, 1005–1026.
Xue, F., & Qu, A. (2021). Integrating multi-source block-wise missing data in model selection. Journal of the American Statistical Association, 116, 1914–1927.
Xue, L. (2009). Empirical likelihood for linear models with missing responses. Journal of Multivariate Analysis, 100, 1353–1366.
Xue, L., & Xue, D. (2011). Empirical likelihood for semiparametric regression model with missing response data. Journal of Multivariate Analysis, 102, 723–740.
Yuan, C., Wu, Y., & Fang, F. (2022). Model averaging for generalized linear models in fragmentary data prediction. Statistical Theory and Related Fields, 6, 344–352.
Yuan, Z., & Yang, Y. (2005). Combining linear regression models: when and how? Journal of the American Statistical Association, 100, 1202–1214.
Zeng, J., Cheng, W., Hu, G., & Rong, Y. (2018). Model averaging procedure for varying-coefficient partially linear models with missing responses. Journal of the Korean Statistical Society, 47, 379–394.
Zhang, X. (2013). Model averaging with covariates that are missing completely at random. Economics Letters, 121, 360–363.
Zhang, X., & Liang, H. (2011). Focused information criterion and model averaging for generalized additive partial linear models. The Annals of Statistics, 39, 174–200.
Zhang, X., & Liu, C. A. (2023). Model averaging prediction by K-fold cross-validation. Journal of Econometrics, 235, 280–301.
Zhang, X., Wan, A. T. K., & Zou, G. (2013). Model averaging by jackknife criterion in models with dependent data. Journal of Econometrics, 174, 82–94.
Zhang, X., & Wang, W. (2019). Optimal model averaging estimation for partially linear models. Statistica Sinica, 29, 693–718.
Zhang, X., Yu, D., Zou, G., & Liang, H. (2016). Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models. Journal of the American Statistical Association, 111, 1775–1790.
Zhang, X., Zou, G., & Carroll, R. J. (2015). Model averaging based on kullback-leibler distance. Statistica Sinica, 25, 1583–1598.
Zhang, Y., Tang, N., & Qu, A. (2020). Imputed factor regression for high-dimensional block-wise missing data. Statistica Sinica, 30, 631–651.
Zhu, R., Wan, A. T. K., Zhang, X., & Zou, G. (2019). A Mallows-type model averaging estimator for the varying-coefficient partially linear model. Journal of the American Statistical Association, 114, 882–892.
Acknowledgements
The authors would like to thank the reviewers and editors for their careful reading and constructive comments. The work of Zeng was supported by the Important Natural Science Foundation of Colleges and Universities of Anhui Province (No. KJ2021A0929), and Research Project of Hefei Normal University (No. 2023XTQTZD28). The work of Hu was supported by the Important Natural Science Foundation of Colleges and Universities of Anhui Province (No. KJ2021A0930), and Research Project of Hefei Normal University (No.2023XTTDZD06).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix: Proofs of the main results
Appendix: Proofs of the main results
For the sake of convenience, we denote the general constant by C whose value could be different at each appearance. Write \(\pi _{i}=\pi (X_{i}, \alpha )\) and \(\hat{\pi }_{i}=\pi (X_{i}, \hat{\alpha })\) for \(i=1,\ldots , n\). Let \(A^{(k)}=I_{n}-H^{(k)}\), \(T^{(k)}=D^{(k)}-I_{n}\), and \(G^{(k)}=T^{(k)}A^{(k)}\). Denote \(A(w)=\sum _{k=1}^{K}w_{k}A^{(k)}\), \(G(w)=\sum _{k=1}^{K}w_{k}G^{(k)}\), and \(M(w)=A'(w)G(w)+G'(w)A(w)+G'(w)G(w)\).
Lemma 1
Suppose that \(X_{i}, i=1\ldots ,n\) are a set of i.i.d. random vectors. If condition (C1) is satisfied, then \(\hat{\alpha } {\mathop {\longrightarrow }\limits ^{p}} \alpha _{0}\), and \(\hat{\alpha } - \alpha _{0}=O_{p}(n^{-1/2})\).
Proof
The proof of \(\hat{\alpha } {\mathop {\longrightarrow }\limits ^{p}} \alpha _{0}\) is the same as the proof of Theorem 1 in Guo et al. (2017), \(\hat{\alpha } - \alpha _{0}=O_{p}(n^{-1/2})\) is implied by Theorem 3.2 of Newey and McFadden (1994).
Lemma 2
Under condition (C1), we have
Proof
Simple calculations yield \(\check{y}_{i}-\mu _{0i}=s_{i}\), \(\hat{y}_{i}-\mu _{0i}=s_{i}+e_{i}\), where \(s_{i}=\dfrac{\delta _{i}}{\pi _{i}}(y_{i}-X_{i}'\beta )+(X_{i}'\beta -\mu _{0i})\), \(e_{i}=(1-\dfrac{\delta _{i}}{\hat{\pi }_{i}})X_{i}'(\hat{\beta }_{c}-\beta )+\delta _{i}(\dfrac{1}{\hat{\pi }_{i}}-\dfrac{1}{\pi _{i}})(y_{i}-X_{i}'\beta )\). Therefore, \(\hat{y}_{i}-\check{y}_{i}=e_{i}\) and \(\Vert \hat{Y}-\check{Y} \Vert ^{2}=\sum _{i=1}^{n}e_{i}^{2}\). By the Taylor series expansion, condition (C1) and Lemma 1, we have
where \(\tilde{\alpha }\) is a vector between \(\hat{\alpha }\) and \(\alpha _{0}\). This together with the facts that \(\pi _{i}\) is bounded away from 0 and \(\dfrac{\delta _{i}}{\hat{\pi }_{i}} =\dfrac{\delta _{i}}{\pi _{i}}+(\dfrac{\delta _{i}}{\hat{\pi }_{i}} -\dfrac{\delta _{i}}{\pi _{i}})\), it is easy to verify that
Combining (A.1), (A.2), the fact that \(\hat{\beta }_{C}-\beta =O_{p}(n^{-1/2})\) and condition (C1), the lemma is then proved. \(\square\)
Lemma 3
Suppose that condition (C2) holds, we have
Proof
\(\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{A(w)\}\le \sup \limits _{w\in \mathcal {W}} \displaystyle \sum _{k=1}^{K}w_{k}\bar{\lambda }(A^{(k)})\le \max _{1\le k \le K}\bar{\lambda }(A^{(k)})\le 1.\)
Note that \(\bar{\lambda }(T^{(k)})=\max \limits _{1\le j \le n}\left\{ (1-h_{jj}^{(k)})^{-1}-1\right\} =\max \limits _{1\le j \le n} \dfrac{h_{jj}^{(k)}}{1-h_{jj}^{(k)}}\). By (A.3) and condition (C2), we have
Combining (A.3) and (A.4), we obtain
\(\square\)
Proof of Theorem 1
Denote \(e=(e_{1}, \ldots , e_{n})'\), \(S=(s_{1}, \ldots , s_{n})'\). \(\Sigma _{S}=E(SS')\) is a diagonal matrix whose ith diagonal component is
Observe that CV(w) can be written as
where the last three terms are independent of w. Therefore, the claim \(\hat{L}_{n}(\hat{w})/\{\inf _{w\in \mathcal {W}}\hat{L}_{n} (w)\}{\mathop {\longrightarrow }\limits ^{p}}1\) is valid if the following hold when \(n\rightarrow \infty\):
and
According to the definition of \(\check{R}_{n}(w)\), it is straightforward to show that
From the Chebyshev’s inequality, Theorem 2 of Whittle (1960), (A.19), conditions (C4) and (C5), for any \(\tau >0\), we observe that
Thus, (A.7) is proved. To show (A.8), it suffices to prove that
and
Similar to the proof of (A.7), we see, for any \(\tau >0\), that
(A.20) is thus obtained. On the other hand, it follows from
and condition (C6) that
So (A.21) holds. We have proved (A.8). By condition (C4), \(E\{(S'S)^{1/2}\}\le \{E(S'S)\}^{1/2}\le \sqrt{nC_{S}}\), and hence \((S'S)^{1/2}=O_{p}(\sqrt{n})\) and \(S'S=O_{p}(n)\). This together with Lemma 2, Lemma 3 and conditions (C3) and (C6) yield
So (A.9), (A.13), (A.15) and (A.16) are correct. Similar to the above proof steps, it can be demonstrated that
Therefore, (A.10), (A.11), (A.12), (A.14) and (A.17) are valid.
Next we prove (A.18). Because
and
it is sufficient to verify that
and
Additionally, recognize that
so (A.22) is implied by
and
To prove (A.24), we follow the steps of showing (A.20), we have for any \(\tau >0\)
So (A.24) is satisfied. The fact \(\text {tr}(H^{(t)}{'}H^{(k)})\le \left\{ \text {tr}(H^{(t)}{'}H^{(t)})\right\} ^{1/2} \left\{ \text {tr}(H^{(k)}{'}H^{(k)})\right\} ^{1/2}\le \sqrt{p_{t}p_{k}}\) together with conditions (C4) and (C6) demonstrate that
which, along with (A.24), implies that \(\sup \limits _{w\in \mathcal {W}}\Vert H(w)S\Vert ^{2}/\check{R}_{n}(w)=o_{p}(1)\). This together with Cauchy-Schwarz inequality and (A.19) indicate
Thus, we can get (A.25). By Lemma 2 and \(\check{\xi }_{n}\rightarrow \infty\), it is not difficult to verify (A.23). Theorem 1 is then proved. \(\square\)
Proof of Theorem 2
When the true model is indeed linear, i.e. \(\mu _{0i}=X_{i}'\beta\), we have \(\hat{y}_{i}-\mu _{0i}=\zeta _{i}\), where \(\zeta _{i}=\frac{\delta _{i}}{\pi _{i}}\epsilon _{i} +(1-\frac{\delta _{i}}{\hat{\pi }_{i}})X_{i}' (\hat{\beta }_{c}-\beta )+(\frac{1}{\hat{\pi }_{i}} -\frac{1}{\pi _{i}})\delta _{i}\epsilon _{i}\). Let \(\epsilon =(\epsilon _{1}, \ldots , \epsilon _{n})'\) and \(\zeta =(\zeta _{1}, \ldots , \zeta _{n})'\). From condition (C1) and the assumption that \(\sigma _{i}^{2}\) is finite, we obtain that \(\textbf{X}'\epsilon =O_{p}(n^{1/2})\), and hence \(\textbf{X}'\zeta =O_{p}(n^{1/2})\). It is seen that
where \(\Omega\) is a \(K\times K\) matrix with the kjth element \(\Omega _{kj}=\hat{Y}'(I_{n}-H^{(k)})'(T^{(k)}+T^{(j)}+T^{(k)} T^{(j)})(I_{n}-H^{(j)})\hat{Y}\). By conditions (C1) and (C2), it follows that \(\Omega _{kj}=O_{p}(1)\). Hence, for any \(w\in \mathcal {W}\),
Let k be the true model belonging to \(\{1,\ldots , K\}\), then from Guo et al. (2017), we know that \(\bar{\beta }^{(k)}-\beta =O_{p}(n^{-1/2})\), which together with (A.26) and (A.27) and condition (C1), implies \(CV(w_{k}^{0})=\Vert \zeta \Vert ^{2}+\eta _{n}(w_{k}^{0})\) with
Therefore, \(CV(\hat{w})\le CV(w_{k}^{0})=\Vert \zeta \Vert ^{2}+\eta _{n}(w_{k}^{0})\), which, together with (A.26), implies
Let \(\Psi _{n}=n^{-1}\textbf{X}'\textbf{X}\). From (A.29), we obtain
and so
Write \(a_{n}=\eta _{n}(w_{k}^{0})+\{\underline{\lambda } (\Psi _{n})\}^{-1}\Vert n^{-1/2} \zeta {'}\textbf{X} \Vert ^{2}-\hat{w}'\Omega \hat{w},\) then (A.30) equals to
which, together with (A.27), (A.28) and \(\textbf{X}'\zeta =O_{p}(n^{1/2})\), means \(\sqrt{n}\{\bar{\beta }(\hat{w})-\beta \}=O_{p}(1)\). This completes the proof. \(\square\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zeng, J., Cheng, W. & Hu, G. Jackknife model averaging for linear regression models with missing responses. J. Korean Stat. Soc. (2024). https://doi.org/10.1007/s42952-024-00259-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42952-024-00259-2
Keywords
- Asymptotic optimality
- Cross-validation
- Model averaging
- Missing data
- Covariate balancing propensity score