Skip to main content

Advertisement

Log in

Jackknife model averaging for linear regression models with missing responses

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

We consider model averaging estimation problem in the linear regression model with missing response data, that allows for model misspecification. Based on the ‘complete’ data set for the response variable after inverse propensity score weighted imputation, we construct a leave-one-out cross-validation criterion for allocating model weights, where the propensity score model is estimated by the covariate balancing propensity score method. We derive some theoretical results to justify the proposed strategy. Firstly, when all candidate outcome regression models are misspecified, our procedures are proved to achieve optimality in terms of asymptotically minimizing the squared loss. Secondly, when the true outcome regression model is among the set of candidate models, the resulting model averaging estimators of the regression parameters are shown to be root-n consistent. Simulation studies provide evidence of the superiority of our methods over other existing model averaging methods, even when the propensity score model is misspecified. As an illustration, the approach is further applied to study the CD4 data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Akaike, H. (1973). Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika, 60, 255–265.

    Article  MathSciNet  Google Scholar 

  • Ando, T., & Li, K. C. (2014). A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association, 109, 254–265.

    Article  MathSciNet  CAS  Google Scholar 

  • Ando, T., & Li, K. C. (2017). A weight-relaxed model averaging approach for high-dimensional generalized linear models. The Annals of Statistics, 45, 2654–2679.

    Article  MathSciNet  Google Scholar 

  • Buckland, S. T., Burnham, K. P., & Augustin, N. H. (1997). Model selection: an integral part of inference. Biometrics, 53, 603–618.

    Article  Google Scholar 

  • Chen, J., & Shao, J. (2000). Nearest neighbor imputation for survey data. Journal of Official Statistics, 16, 113–131.

    CAS  Google Scholar 

  • Cheng, P. E. (1994). Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association, 89, 81–87.

    Article  Google Scholar 

  • Claeskens, G., Croux, C., & van Kerckhoven, J. (2006). Variable selection for logistic regression using a prediction-focused information criterion. Biometrics, 62, 972–979.

    Article  MathSciNet  PubMed  Google Scholar 

  • Dardanoni, V., Modica, S., & Peracchi, F. (2011). Regression with imputed covariates: A generalized missing-indicator approach. Journal of Econometrics, 162, 362–368.

    Article  MathSciNet  Google Scholar 

  • Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business and Economic Statistics, 13, 253–263.

    Google Scholar 

  • Ding, X., Xie, J., & Yan, X. (2021). Model averaging for multiple quantile regression with covariates missing at random. Journal of Statistical Computation and Simulation, 91, 2249–2275.

    Article  MathSciNet  Google Scholar 

  • Fan, J., & Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association, 99, 710–723.

    Article  MathSciNet  Google Scholar 

  • Fang, F., Lan, W., Tong, J., & Shao, J. (2019). Model averaging for prediction with fragmentary data. Journal of Business and Economic Statistics, 37, 517–527.

    Article  MathSciNet  Google Scholar 

  • Gao, Y., Zhang, X., Wang, S., & Zou, G. (2016). Model averaging based on leave-subject-out cross-validation. Journal of Econometrics, 192, 139–151.

    Article  MathSciNet  Google Scholar 

  • Guo, D., Xue, L., & Hu, Y. (2017). Covariate-balancing-propensity-score-based inference for linear models with missing responses. Statistics and Probability Letters, 123, 139–145.

    Article  MathSciNet  Google Scholar 

  • Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75, 1175–1189.

    Article  MathSciNet  Google Scholar 

  • Hansen, B. E. (2008). Least squares forecast averaging. Journal of Econometrics, 146, 342–350.

    Article  MathSciNet  Google Scholar 

  • Hansen, B. E. (2014). Model averaging, asymptotic risk, and regressor groups. Quantitative Economics, 5, 495–530.

    Article  MathSciNet  Google Scholar 

  • Hansen, B. E., & Racine, J. S. (2012). Jackknife model averaging. Journal of Econometrics, 167, 38–46.

    Article  MathSciNet  Google Scholar 

  • Hjort, N. L., & Claeskens, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879–899.

    Article  MathSciNet  Google Scholar 

  • Huang, J. Z., Wu, C. O., & Zhou, L. (2002). Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika, 89, 111–128.

    Article  MathSciNet  Google Scholar 

  • Imai, K., & Ratkovic, M. (2014). Covariate balancing propensity score. Journal of the Royal Statistical Society: Series B, 76, 243–263.

    Article  MathSciNet  Google Scholar 

  • Kang, J., & Schafer, J. L. (2007). Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 574–580.

    MathSciNet  Google Scholar 

  • King, G., Honaker, J., Joseph, A., & Scheve, K. (2001). Analyzing incomplete political science data: an alternative algorithm for multiple imputation. American Political Science Review, 95, 49–69.

    Article  Google Scholar 

  • Li, K. C. (1987). Asymptotic optimality for \(C_p\), \(C_L\), cross-validation and generalized cross-validation: discrete index set. The Annals of Statistics, 15, 958–975.

    Article  MathSciNet  Google Scholar 

  • Liang, H., Wang, S., & Carroll, R. J. (2007). Partially linear models with missing response variables and error-prone covariates. Biometrika, 94, 185–198.

    Article  MathSciNet  PubMed  Google Scholar 

  • Liang, H., Wang, S., Robins, J. M., & Carroll, R. J. (2004). Estimation in partially linear models with missing covariates. Journal of the American Statistical Association, 99, 357–367.

    Article  MathSciNet  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.). Hoboken, NJ: Wiley.

    Book  Google Scholar 

  • Liu, Q., & Okui, R. (2013). Heteroscedasticity-robust \(C_p\) model averaging. The Econometrics Journal, 16, 463–472.

    Article  MathSciNet  Google Scholar 

  • Liu, Q., & Zheng, M. (2020). Model averaging for generalized linear model with covariates that are missing completely at random. The Journal of Quantitative Economics, 11, 25–40.

    CAS  Google Scholar 

  • Lu, X., & Su, L. (2015). Jackknife model averaging for quantile regressions. Journal of Econometrics, 188, 40–58.

    Article  MathSciNet  Google Scholar 

  • Mallows, C. L. (1973). Some comments on \(C_{p}\). Technometrics, 15, 661–675.

    Google Scholar 

  • Newey, W. K., & McFadden, D. (1994). Large sample estimation and hypothesis testing. In R. F. Engle & D. L. McFadden (Eds.), Handbook of Econometrics (Vol. IV, pp. 2111–2245). Amsterdam: North-Holland.

    Google Scholar 

  • Qin, Y., & Lei, Q. (2010). On empirical likelihood for linear models with missing responses. Journal of Statistical Planning and Inference, 140, 3399–3408.

    Article  MathSciNet  Google Scholar 

  • Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.

    Article  MathSciNet  Google Scholar 

  • Scharfstein, D. O., & Robins, R. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94, 1096–1120.

    Article  MathSciNet  Google Scholar 

  • Schomaker, M., Wan, A. T. K., & Heumann, C. (2010). Frequentist model averaging with missing observations. Computational Statistics and Data Analysis, 54, 3336–3347.

    Article  MathSciNet  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.

    Article  ADS  MathSciNet  Google Scholar 

  • Sun, Z., Su, Z., & Ma, J. (2014). Focused vector information criterion model selection and model averaging regression with missing response. Metrika, 77, 415–432.

    Article  MathSciNet  Google Scholar 

  • Sun, Z., Wang, Q., & Dai, P. (2009). Model checking for partially linear models with missing responses at random. Journal of Multivariate Analysis, 100, 636–651.

    Article  MathSciNet  Google Scholar 

  • Wan, A. T. K., Zhang, X., & Zou, G. (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics, 156, 277–283.

    Article  MathSciNet  Google Scholar 

  • Wang, Q., Linton, O., & Härdle, W. (2004). Semiparametric regression analysis with missing response at random. Journal of the American Statistical Association, 99, 334–345.

    Article  MathSciNet  Google Scholar 

  • Wang, Q., & Rao, J. N. K. (2002). Empirical likelihood-based inference in linear models with missing data. Scandinavian Journal of Statistics, 29, 563–576.

    Article  MathSciNet  Google Scholar 

  • Wei, Y., & Wang, Q. (2021). Cross-validation-based model averaging in linear models with response missing at random. Statistics and Probability Letters, 171, 108990.

    Article  MathSciNet  Google Scholar 

  • Wei, Y., Wang, Q., & Liu, W. (2021). Model averaging for linear models with responses missing at random. Annals of the Institute of Statistical Mathematics, 73, 535–553.

    Article  MathSciNet  Google Scholar 

  • Whittle, P. (1960). Bounds for the moments of linear and quadratic forms in independent variables. Theory of Probability and Its Applications, 5, 302–305.

    Article  MathSciNet  Google Scholar 

  • Xie, J., Yan, X., & Tang, N. (2021). A model-averaging method for high-dimensional regression with missing responses at random. Statistica Sinica, 31, 1005–1026.

    MathSciNet  Google Scholar 

  • Xue, F., & Qu, A. (2021). Integrating multi-source block-wise missing data in model selection. Journal of the American Statistical Association, 116, 1914–1927.

    Article  MathSciNet  CAS  Google Scholar 

  • Xue, L. (2009). Empirical likelihood for linear models with missing responses. Journal of Multivariate Analysis, 100, 1353–1366.

    Article  MathSciNet  Google Scholar 

  • Xue, L., & Xue, D. (2011). Empirical likelihood for semiparametric regression model with missing response data. Journal of Multivariate Analysis, 102, 723–740.

    Article  MathSciNet  Google Scholar 

  • Yuan, C., Wu, Y., & Fang, F. (2022). Model averaging for generalized linear models in fragmentary data prediction. Statistical Theory and Related Fields, 6, 344–352.

    Article  MathSciNet  Google Scholar 

  • Yuan, Z., & Yang, Y. (2005). Combining linear regression models: when and how? Journal of the American Statistical Association, 100, 1202–1214.

    Article  MathSciNet  CAS  Google Scholar 

  • Zeng, J., Cheng, W., Hu, G., & Rong, Y. (2018). Model averaging procedure for varying-coefficient partially linear models with missing responses. Journal of the Korean Statistical Society, 47, 379–394.

    Article  MathSciNet  Google Scholar 

  • Zhang, X. (2013). Model averaging with covariates that are missing completely at random. Economics Letters, 121, 360–363.

    Article  MathSciNet  Google Scholar 

  • Zhang, X., & Liang, H. (2011). Focused information criterion and model averaging for generalized additive partial linear models. The Annals of Statistics, 39, 174–200.

    Article  MathSciNet  Google Scholar 

  • Zhang, X., & Liu, C. A. (2023). Model averaging prediction by K-fold cross-validation. Journal of Econometrics, 235, 280–301.

    Article  MathSciNet  Google Scholar 

  • Zhang, X., Wan, A. T. K., & Zou, G. (2013). Model averaging by jackknife criterion in models with dependent data. Journal of Econometrics, 174, 82–94.

    Article  MathSciNet  Google Scholar 

  • Zhang, X., & Wang, W. (2019). Optimal model averaging estimation for partially linear models. Statistica Sinica, 29, 693–718.

    MathSciNet  Google Scholar 

  • Zhang, X., Yu, D., Zou, G., & Liang, H. (2016). Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models. Journal of the American Statistical Association, 111, 1775–1790.

    Article  MathSciNet  CAS  Google Scholar 

  • Zhang, X., Zou, G., & Carroll, R. J. (2015). Model averaging based on kullback-leibler distance. Statistica Sinica, 25, 1583–1598.

    MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Zhang, Y., Tang, N., & Qu, A. (2020). Imputed factor regression for high-dimensional block-wise missing data. Statistica Sinica, 30, 631–651.

    MathSciNet  Google Scholar 

  • Zhu, R., Wan, A. T. K., Zhang, X., & Zou, G. (2019). A Mallows-type model averaging estimator for the varying-coefficient partially linear model. Journal of the American Statistical Association, 114, 882–892.

    Article  MathSciNet  CAS  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the reviewers and editors for their careful reading and constructive comments. The work of Zeng was supported by the Important Natural Science Foundation of Colleges and Universities of Anhui Province (No. KJ2021A0929), and Research Project of Hefei Normal University (No. 2023XTQTZD28). The work of Hu was supported by the Important Natural Science Foundation of Colleges and Universities of Anhui Province (No. KJ2021A0930), and Research Project of Hefei Normal University (No.2023XTTDZD06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guozhi Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 430 KB)

Appendix: Proofs of the main results

Appendix: Proofs of the main results

For the sake of convenience, we denote the general constant by C whose value could be different at each appearance. Write \(\pi _{i}=\pi (X_{i}, \alpha )\) and \(\hat{\pi }_{i}=\pi (X_{i}, \hat{\alpha })\) for \(i=1,\ldots , n\). Let \(A^{(k)}=I_{n}-H^{(k)}\), \(T^{(k)}=D^{(k)}-I_{n}\), and \(G^{(k)}=T^{(k)}A^{(k)}\). Denote \(A(w)=\sum _{k=1}^{K}w_{k}A^{(k)}\), \(G(w)=\sum _{k=1}^{K}w_{k}G^{(k)}\), and \(M(w)=A'(w)G(w)+G'(w)A(w)+G'(w)G(w)\).

Lemma 1

Suppose that \(X_{i}, i=1\ldots ,n\) are a set of i.i.d. random vectors. If condition (C1) is satisfied, then \(\hat{\alpha } {\mathop {\longrightarrow }\limits ^{p}} \alpha _{0}\), and \(\hat{\alpha } - \alpha _{0}=O_{p}(n^{-1/2})\).

Proof

The proof of \(\hat{\alpha } {\mathop {\longrightarrow }\limits ^{p}} \alpha _{0}\) is the same as the proof of Theorem 1 in Guo et al. (2017), \(\hat{\alpha } - \alpha _{0}=O_{p}(n^{-1/2})\) is implied by Theorem 3.2 of Newey and McFadden (1994).

Lemma 2

Under condition (C1), we have

$$\begin{aligned} \Vert \hat{Y}-\check{Y} \Vert ^{2}=O_{p}(1). \end{aligned}$$

Proof

Simple calculations yield \(\check{y}_{i}-\mu _{0i}=s_{i}\), \(\hat{y}_{i}-\mu _{0i}=s_{i}+e_{i}\), where \(s_{i}=\dfrac{\delta _{i}}{\pi _{i}}(y_{i}-X_{i}'\beta )+(X_{i}'\beta -\mu _{0i})\), \(e_{i}=(1-\dfrac{\delta _{i}}{\hat{\pi }_{i}})X_{i}'(\hat{\beta }_{c}-\beta )+\delta _{i}(\dfrac{1}{\hat{\pi }_{i}}-\dfrac{1}{\pi _{i}})(y_{i}-X_{i}'\beta )\). Therefore, \(\hat{y}_{i}-\check{y}_{i}=e_{i}\) and \(\Vert \hat{Y}-\check{Y} \Vert ^{2}=\sum _{i=1}^{n}e_{i}^{2}\). By the Taylor series expansion, condition (C1) and Lemma 1, we have

$$\begin{aligned} \max \limits _{1\le i \le n} \left| \frac{1}{\hat{\pi }_{i}}-\frac{1}{\pi _{i}} \right|&=\max \limits _{1\le i \le n} \left| \left\{ \frac{1}{\pi ^{2}(X_{i}, \alpha )}\frac{\partial \pi (X_{i}, \alpha )}{\partial \alpha '}\right\} \Big |_{\alpha =\tilde{\alpha }}\cdot (\hat{\alpha }-\alpha _{0}) \right| \nonumber \\&\le \max \limits _{1\le i \le n}\left\| \frac{1}{\pi ^{2}(X_{i}, \alpha )}\frac{\partial \pi (X_{i}, \alpha )}{\partial \alpha } \Big |_{\alpha =\tilde{\alpha }}\right\| \cdot \Vert \hat{\alpha }-\alpha _{0} \Vert \nonumber \\&=O_{p}\left( n^{-1/2}\right) , \end{aligned}$$
(A.1)

where \(\tilde{\alpha }\) is a vector between \(\hat{\alpha }\) and \(\alpha _{0}\). This together with the facts that \(\pi _{i}\) is bounded away from 0 and \(\dfrac{\delta _{i}}{\hat{\pi }_{i}} =\dfrac{\delta _{i}}{\pi _{i}}+(\dfrac{\delta _{i}}{\hat{\pi }_{i}} -\dfrac{\delta _{i}}{\pi _{i}})\), it is easy to verify that

$$\begin{aligned} \max \limits _{1\le i \le n} \left| \frac{\delta _{i}}{\hat{\pi }_{i}}\right| \le \max \limits _{1\le i \le n} \left| \frac{1}{\pi _{i}}\right| + \max \limits _{1\le i \le n} \left| \frac{1}{\hat{\pi }_{i}}-\frac{1}{\pi _{i}} \right| =O_{p}(1). \end{aligned}$$
(A.2)

Combining (A.1), (A.2), the fact that \(\hat{\beta }_{C}-\beta =O_{p}(n^{-1/2})\) and condition (C1), the lemma is then proved. \(\square\)

Lemma 3

Suppose that condition (C2) holds, we have

$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{A(w)\}\le 1, \end{aligned}$$
(A.3)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{G(w)\}=O_{p}(\bar{p}n^{-1}),\end{aligned}$$
(A.4)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{M(w)\}=O_{p}(\bar{p}n^{-1}). \end{aligned}$$
(A.5)

Proof

\(\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{A(w)\}\le \sup \limits _{w\in \mathcal {W}} \displaystyle \sum _{k=1}^{K}w_{k}\bar{\lambda }(A^{(k)})\le \max _{1\le k \le K}\bar{\lambda }(A^{(k)})\le 1.\)

Note that \(\bar{\lambda }(T^{(k)})=\max \limits _{1\le j \le n}\left\{ (1-h_{jj}^{(k)})^{-1}-1\right\} =\max \limits _{1\le j \le n} \dfrac{h_{jj}^{(k)}}{1-h_{jj}^{(k)}}\). By (A.3) and condition (C2), we have

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{G(w)\}&\le \sup \limits _{w\in \mathcal {W}} \displaystyle \sum _{k=1}^{K}w_{k}\bar{\lambda }(T^{(k)}A^{(k)})\le \max _{1\le k \le K}\bar{\lambda }(T^{(k)})\bar{\lambda }(A^{(k)})\\&\le \max _{1\le k \le K}\max _{1\le j \le n}\frac{h_{jj}^{(k)}}{1-h_{jj}^{(k)}}=O_{p}(\bar{p}n^{-1}). \end{aligned}$$

Combining (A.3) and (A.4), we obtain

$$\begin{aligned} \begin{aligned} \sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{M(w)\}&\le 2\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{A'(w)G(w)\}+\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{G'(w)G(w)\} \\&\le 2\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{A(w)\}\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{G(w)\}+\left[ \sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{G(w)\}\right] ^{2}=O_{p}(\bar{p}n^{-1}). \end{aligned} \end{aligned}$$

\(\square\)

Proof of Theorem 1

Denote \(e=(e_{1}, \ldots , e_{n})'\), \(S=(s_{1}, \ldots , s_{n})'\). \(\Sigma _{S}=E(SS')\) is a diagonal matrix whose ith diagonal component is

$$\begin{aligned} \sigma _{S,ii}^{2}=E(s_{i}^{2})=\sigma _{i}^{2}/ \pi _{i}+(1/\pi _{i}+1)(X_{i}'\beta -\mu _{0i})^{2}. \end{aligned}$$
(A.6)

Observe that CV(w) can be written as

$$\begin{aligned} CV(w)&= \hat{Y}A'(w)A(w)\hat{Y}+\hat{Y}'M(w)\hat{Y}\\&= \hat{L}_{n}(w)+2\mu _{0}'A'(w)S-2S'H'(w)S+4e'A'(w)S +2\mu _{0}'A'(w)e \\&\quad +2e'A'(w)e+\mu _{0}'M(w)\mu _{0}+2\mu _{0}'M(w)S +2\mu _{0}'M(w)e\\&\quad +S'M(w)S+2S'M(w)e+e'M(w)e+S'S-2S'e-e'e, \end{aligned}$$

where the last three terms are independent of w. Therefore, the claim \(\hat{L}_{n}(\hat{w})/\{\inf _{w\in \mathcal {W}}\hat{L}_{n} (w)\}{\mathop {\longrightarrow }\limits ^{p}}1\) is valid if the following hold when \(n\rightarrow \infty\):

$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|\mu _{0}'A'(w)S|/\check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.7)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|S'H'(w)S|/\check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.8)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|e'A'(w)S|/\check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.9)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|\mu _{0}'A'(w)e|/\check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.10)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|e'A'(w)e|/\check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.11)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|\mu _{0}'M(w)\mu _{0}|/\check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.12)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|\mu _{0}'M(w)S|/\check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.13)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|\mu _{0}'M(w)e|/\check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.14)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|S'M(w)S|/ \check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.15)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|S'M(w)e|/ \check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.16)
$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}|e'M(w)e|/\check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.17)

and

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}|\hat{L}_{n}(w)/ \check{R}_{n}(w)-1|=o_{p}(1). \end{aligned}$$
(A.18)

According to the definition of \(\check{R}_{n}(w)\), it is straightforward to show that

$$\begin{aligned} \check{R}_{n}(w)=\Vert A(w)\mu _{0}\Vert ^{2}+\text{tr}\{H'(w)H(w)\Sigma _{S}\}. \end{aligned}$$
(A.19)

From the Chebyshev’s inequality, Theorem 2 of Whittle (1960), (A.19), conditions (C4) and (C5), for any \(\tau >0\), we observe that

$$\begin{aligned} P&\left\{ \sup \limits _{w\in \mathcal {W}}\vert \mu _{0}'A'(w)S\vert /\check{R}_{n}(w)>\tau \right\} \le \sum _{k=1}^{K}P\left\{ \vert \mu _{0}'A'(w_{k}^{0})S\vert > \tau \check{\xi }_{n}\right\} \\&\le \tau ^{-2N}\check{\xi }_{n}^{-2N}\displaystyle \sum _{k=1}^{K}E \{\mu _{0}'A'(w_{k}^{0})S\}^{2N} \le C C_{S}^{N}\tau ^{-2N} \check{\xi }_{n}^{-2N}\displaystyle \sum _{k=1}^{K} \Vert A(w_{k}^{0})\mu _{0}\Vert ^{2N} \\&\le C C_{S}^{N}\tau ^{-2N}\check{\xi }_{n}^{-2N} \displaystyle \sum _{k=1}^{K}\left\{ \check{R}_{n} (w_{k}^{0})\right\} ^{N} =o_{p}(1). \end{aligned}$$

Thus, (A.7) is proved. To show (A.8), it suffices to prove that

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}\vert S'H'(w)S-\text {tr}\{H'(w)\Sigma _{S}\}\vert /\check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.20)

and

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}\vert \text {tr}\{H'(w) \Sigma _{S}\}\vert /\check{R}_{n}(w)=o_{p}(1). \end{aligned}$$
(A.21)

Similar to the proof of (A.7), we see, for any \(\tau >0\), that

$$\begin{aligned} P&\left\{ \sup \limits _{w\in \mathcal {W}}\vert S'H'(w)S-\text {tr}\{H'(w)\Sigma _{S}\}\vert / \check{R}_{n}(w)>\tau \right\} \\&\le \displaystyle \sum _{k=1}^{K}P\left\{ \vert S'H'(w_{k}^{0})S -\text {tr}\{H'(w_{k}^{0})\Sigma _{S}\}\vert >\tau \check{\xi }_{n}\right\} \\&\le \tau ^{-2N}\check{\xi }_{n}^{-2N}\displaystyle \sum _{k=1}^{K}E \left[ S'H'(w_{k}^{0})S-\text {tr}\left\{ H'(w_{k}^{0}) \Sigma _{S}\right\} \right] ^{2N} \\&\le CC_{S}^{N}\tau ^{-2N}\check{\xi }_{n}^{-2N} \displaystyle \sum _{k=1}^{K}\left[ \text {tr} \left\{ H'(w_{k}^{0})\Sigma _{S}H(w_{k}^{0})\right\} \right] ^{N} \\&\le CC_{S}^{N}\tau ^{-2N}\check{\xi }_{n}^{-2N} \displaystyle \sum _{k=1}^{K}\left[ \check{R}_{n} (w_{k}^{0})\right] ^{N}=o_{p}(1), \end{aligned}$$

(A.20) is thus obtained. On the other hand, it follows from

$$\begin{aligned} \vert \text {tr}(H^{(k)}{'}\Sigma _{S})\vert&=\frac{1}{2}\vert \text {tr}(H^{(k)}{'}\Sigma _{S}+\Sigma _{S}H^{(k)})\vert \\&\le \frac{1}{2}\bar{\lambda }(H^{(k)}{'}\Sigma _{S} +\Sigma _{S}H^{(k)})\text {rank}(H^{(k)}{'}\Sigma _{S} +\Sigma _{S}H^{(k)}) \\&\le 2\bar{\lambda }(H^{(k)}) \bar{\lambda }(\Sigma _{S}) \text {rank}(H^{(k)}{'}\Sigma _{S}) \le 2C_{S}\cdot \bar{p} \end{aligned}$$

and condition (C6) that

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}\vert \text {tr} (H'(w)\Sigma _{S})\vert /\check{R}_{n}(w)\le \check{\xi }_{n}^{-1}\max _{1\le k \le K}\left| \text {tr}(H^{(k)}{'}\Sigma _{S})\right| \le 2C_{S}\cdot \bar{p}\check{\xi }_{n}^{-1}=o_{p}(1). \end{aligned}$$

So (A.21) holds. We have proved (A.8). By condition (C4), \(E\{(S'S)^{1/2}\}\le \{E(S'S)\}^{1/2}\le \sqrt{nC_{S}}\), and hence \((S'S)^{1/2}=O_{p}(\sqrt{n})\) and \(S'S=O_{p}(n)\). This together with Lemma 2, Lemma 3 and conditions (C3) and (C6) yield

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}|e'A'(w)S|/\check{R}_{n}(w)&\le \check{\xi }_{n}^{-1}\sup \limits _{w\in \mathcal {W}}\{S'A(w)ee'A'(w)S\}^{1/2}\\&\le \check{\xi }_{n}^{-1} (e'e)^{1/2}(S'S)^{1/2}\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{A(w)\}\\&\le \check{\xi }_{n}^{-1}O_{p}(1)O_{p}(\sqrt{n})\cdot 1=o_{p}(1),\\ \sup \limits _{w\in \mathcal {W}}|\mu _{0}'M(w)S|/\check{R}_{n}(w)&\le \check{\xi }_{n}^{-1} (\mu _{0}'\mu _{0})^{1/2}(S'S)^{1/2}\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{M(w)\}\\&=\check{\xi }_{n}^{-1}O(\sqrt{n})O_{p} (\sqrt{n})O_{p}(\bar{p}n^{-1})=o_{p}(1),\\ \sup \limits _{w\in \mathcal {W}}|S'M(w)S|/\check{R}_{n}(w)&\le \check{\xi }_{n}^{-1} (S'S)\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{M(w)\}\\&=\check{\xi }_{n}^{-1}O_{p}(n)O_{p}(\bar{p}n^{-1})=o_{p}(1),\\ \sup \limits _{w\in \mathcal {W}}|S'M(w)e|/\check{R}_{n}(w)&\le \check{\xi }_{n}^{-1} (S'S)^{1/2}(e'e)^{1/2}\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{M(w)\}\\&\le \check{\xi }_{n}^{-1}O_{p}(\sqrt{n}) O_{p}(1)O_{p}(\bar{p}n^{-1})=o_{p}(1). \end{aligned}$$

So (A.9), (A.13), (A.15) and (A.16) are correct. Similar to the above proof steps, it can be demonstrated that

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}|\mu _{0}'A'(w)e|/\check{R}_{n}(w)&\le \check{\xi }_{n}^{-1}(\mu _{0}'\mu _{0})^{1/2}(e'e)^{1/2} \sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{A(w)\}\\&\le \check{\xi }_{n}^{-1} O_{p}(\sqrt{n})O_{p}(1)\cdot 1=o_{p}(1),\\ \sup \limits _{w\in \mathcal {W}}|e'A'(w)e|/\check{R}_{n}(w)&\le \check{\xi }_{n}^{-1} (e'e)\sup \limits _{w\in \mathcal {W}} \bar{\lambda }\{A(w)\}\le \check{\xi }_{n}^{-1} O_{p}(1)\cdot 1=o_{p}(1),\\ \sup \limits _{w\in \mathcal {W}}|\mu _{0}'M(w)\mu _{0}|/\check{R}_{n}(w)&\le \check{\xi }_{n}^{-1} (\mu _{0}'\mu _{0}) \sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{M(w)\}\\&=\check{\xi }_{n}^{-1}O(n)O_{p}(\bar{p}n^{-1})=o_{p}(1),\\ \sup \limits _{w\in \mathcal {W}}|\mu _{0}'M(w)e|/\check{R}_{n}(w)&\le \check{\xi }_{n}^{-1} (\mu _{0}'\mu _{0})^{1/2} (e'e)^{1/2}\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{M(w)\}\\&=\check{\xi }_{n}^{-1}O(\sqrt{n})O_{p}(1)O_{p}(\bar{p}n^{-1})=o_{p}(1),\\ \sup \limits _{w\in \mathcal {W}}|e'M(w)e|/\check{R}_{n}(w)&\le \check{\xi }_{n}^{-1} (e'e)\sup \limits _{w\in \mathcal {W}}\bar{\lambda }\{M(w)\}\\&\le \check{\xi }_{n}^{-1}O_{p}(1)O_{p}(\bar{p}n^{-1})=o_{p}(1). \end{aligned}$$

Therefore, (A.10), (A.11), (A.12), (A.14) and (A.17) are valid.

Next we prove (A.18). Because

$$\begin{aligned} \hat{L}_{n}(w)&=\Vert \mu _{0}-\hat{\mu }(w)\Vert ^{2} =\Vert \mu _{0}-\check{\mu }(w)+\check{\mu }(w)-\hat{\mu }(w)\Vert ^{2}\\&\le \check{L}_{n}(w)+2\{\check{L}_{n}(w)\}^{1/2} \Vert \check{\mu }(w)-\hat{\mu }(w)\Vert +\Vert \check{\mu }(w) -\hat{\mu }(w)\Vert ^{2}, \end{aligned}$$

and

$$\begin{aligned} \Vert \check{\mu }(w)-\hat{\mu }(w)\Vert ^{2}=\Vert H(w)\check{Y} -H(w)\hat{Y}\Vert ^{2}\le \Vert \check{Y}-\hat{Y}\Vert ^{2}, \end{aligned}$$

it is sufficient to verify that

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}|\check{L}_{n}(w)/ \check{R}_{n}(w)-1|=o_{p}(1), \end{aligned}$$
(A.22)

and

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}} \Vert \check{Y}-\hat{Y}\Vert ^{2}/\check{R}_{n}(w)=o_{p}(1). \end{aligned}$$
(A.23)

Additionally, recognize that

$$\begin{aligned} |\check{L}_{n}(w)-\check{R}_{n}(w)|=\left| \Vert H(w)S\Vert ^{2}-\text{tr}\{H'(w)H(w)\Sigma _{S}\} -2\mu _{0}'A'(w)H(w)S\right| , \end{aligned}$$

so (A.22) is implied by

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}\left| \Vert H(w)S\Vert ^{2}-\text{tr}\{H'(w)H(w)\Sigma _{S}\}\right| / \check{R}_{n}(w)=o_{p}(1), \end{aligned}$$
(A.24)

and

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}\left| \mu _{0}'A'(w)H(w) S\right| /\check{R}_{n}(w)=o_{p}(1). \end{aligned}$$
(A.25)

To prove (A.24), we follow the steps of showing (A.20), we have for any \(\tau >0\)

$$\begin{aligned}&P\left\{ \sup \limits _{w\in \mathcal {W}}\left| \Vert H(w)S\Vert ^{2}-\text{tr}\{H'(w)H(w)\Sigma _{S}\}\right| / \check{R}_{n}(w)>\tau \right\} \\&\quad \le P\left\{ \sup \limits _{w\in \mathcal {W}}\displaystyle \sum _{t=1}^{K} \displaystyle \sum _{k=1}^{K}w_{t}w_{k} \left| S'H^{(t)}{'}H^{(k)}S-\text{tr}(H^{(t)}{'}H^{(k)} \Sigma _{S})\right|>\tau \check{\xi }_{n}\right\} \\&\quad \le \displaystyle \sum _{t=1}^{K} \displaystyle \sum _{k=1}^{K}P\left\{ \left| S'H'(w_{t}^{0}) H(w_{k}^{0})S-\text {tr}\{H'(w_{t}^{0})H(w_{k}^{0}) \Sigma _{S}\}\right| >\tau \check{\xi }_{n}\right\} \\&\quad \le \tau ^{-2N}\check{\xi }_{n}^{-2N} \displaystyle \sum _{t=1}^{K}\displaystyle \sum _{k=1}^{K}E \left[ S'H'(w_{t}^{0})H(w_{k}^{0})S-\text {tr}\{H'(w_{t}^{0}) H(w_{k}^{0})\Sigma _{S}\}\right] ^{2N} \\&\quad \le CC_{S}^{N}\tau ^{-2N}\check{\xi }_{n}^{-2N} \displaystyle \sum _{t=1}^{K}\displaystyle \sum _{k=1}^{K} \left[ \text {tr}\left\{ H'(w_{t}^{0})H(w_{k}^{0}) \Sigma _{S}H'(w_{k}^{0})H(w_{t}^{0})\right\} \right] ^{N} \\&\quad \le CC_{S}^{N}\tau ^{-2N}\check{\xi }_{n}^{-2N} \displaystyle \sum _{t=1}^{K}\displaystyle \sum _{k=1}^{K} \left[ \text {tr}\left\{ H(w_{k}^{0}) \Sigma _{S}H'(w_{k}^{0})\right\} \right] ^{N} \\&\quad \le CC_{S}^{N}\tau ^{-2N}\check{\xi }_{n}^{-2N}K \displaystyle \sum _{k=1}^{K}\left\{ \check{R}_{n} (w_{k}^{0})\right\} ^{N} =o_{p}(1). \end{aligned}$$

So (A.24) is satisfied. The fact \(\text {tr}(H^{(t)}{'}H^{(k)})\le \left\{ \text {tr}(H^{(t)}{'}H^{(t)})\right\} ^{1/2} \left\{ \text {tr}(H^{(k)}{'}H^{(k)})\right\} ^{1/2}\le \sqrt{p_{t}p_{k}}\) together with conditions (C4) and (C6) demonstrate that

$$\begin{aligned}&\sup \limits _{w\in \mathcal {W}}\left| \text{tr}\{H'(w) H(w)\Sigma _{S}\}\right| /\check{R}_{n}(w)\\&\quad \le \sup \limits _{w\in \mathcal {W}} \left| \bar{\lambda }(\Sigma _{S})\text{tr} \left( \displaystyle \sum _{t=1}^{K}\displaystyle \sum _{k=1}^{K}w_{t}w_{k}H^{(t)}{'}H^{(k)}\right) \right| \Bigg /\check{\xi }_{n}\\&\quad \le C_{S}\cdot \bar{p}/\check{\xi }_{n}=o_{p}(1), \end{aligned}$$

which, along with (A.24), implies that \(\sup \limits _{w\in \mathcal {W}}\Vert H(w)S\Vert ^{2}/\check{R}_{n}(w)=o_{p}(1)\). This together with Cauchy-Schwarz inequality and (A.19) indicate

$$\begin{aligned} \sup \limits _{w\in \mathcal {W}}\left| \mu _{0}'A'(w)H(w)S\right| / \check{R}_{n}(w)&\le \left\{ \Vert A(w)\mu _{0}\Vert ^{2}\Vert H(w)S\Vert ^{2}/\check{R}^{2}_{n}(w)\right\} ^{1/2}\\&\le \left\{ \Vert H(w)S\Vert ^{2}/\check{R}_{n}(w) \right\} ^{1/2}=o_{p}(1). \end{aligned}$$

Thus, we can get (A.25). By Lemma 2 and \(\check{\xi }_{n}\rightarrow \infty\), it is not difficult to verify (A.23). Theorem 1 is then proved. \(\square\)

Proof of Theorem 2

When the true model is indeed linear, i.e. \(\mu _{0i}=X_{i}'\beta\), we have \(\hat{y}_{i}-\mu _{0i}=\zeta _{i}\), where \(\zeta _{i}=\frac{\delta _{i}}{\pi _{i}}\epsilon _{i} +(1-\frac{\delta _{i}}{\hat{\pi }_{i}})X_{i}' (\hat{\beta }_{c}-\beta )+(\frac{1}{\hat{\pi }_{i}} -\frac{1}{\pi _{i}})\delta _{i}\epsilon _{i}\). Let \(\epsilon =(\epsilon _{1}, \ldots , \epsilon _{n})'\) and \(\zeta =(\zeta _{1}, \ldots , \zeta _{n})'\). From condition (C1) and the assumption that \(\sigma _{i}^{2}\) is finite, we obtain that \(\textbf{X}'\epsilon =O_{p}(n^{1/2})\), and hence \(\textbf{X}'\zeta =O_{p}(n^{1/2})\). It is seen that

$$\begin{aligned} CV(w)&=\Vert \{I_{n}-H(w)\}\hat{Y}\Vert ^{2}+w'\Omega w \nonumber \\&=\Vert \zeta \Vert ^{2}+\{\bar{\beta }(w)-\beta \}'\textbf{X}' \textbf{X}\{\bar{\beta }(w)-\beta \}-2\zeta '\textbf{X} \{\bar{\beta }(w)-\beta \}+w'\Omega w, \end{aligned}$$
(A.26)

where \(\Omega\) is a \(K\times K\) matrix with the kjth element \(\Omega _{kj}=\hat{Y}'(I_{n}-H^{(k)})'(T^{(k)}+T^{(j)}+T^{(k)} T^{(j)})(I_{n}-H^{(j)})\hat{Y}\). By conditions (C1) and (C2), it follows that \(\Omega _{kj}=O_{p}(1)\). Hence, for any \(w\in \mathcal {W}\),

$$\begin{aligned} w' \Omega w=O_{p}(1). \end{aligned}$$
(A.27)

Let k be the true model belonging to \(\{1,\ldots , K\}\), then from Guo et al. (2017), we know that \(\bar{\beta }^{(k)}-\beta =O_{p}(n^{-1/2})\), which together with (A.26) and (A.27) and condition (C1), implies \(CV(w_{k}^{0})=\Vert \zeta \Vert ^{2}+\eta _{n}(w_{k}^{0})\) with

$$\begin{aligned} \eta _{n}(w_{k}^{0})=O_{p}(1). \end{aligned}$$
(A.28)

Therefore, \(CV(\hat{w})\le CV(w_{k}^{0})=\Vert \zeta \Vert ^{2}+\eta _{n}(w_{k}^{0})\), which, together with (A.26), implies

$$\begin{aligned} \eta _{n}(w_{k}^{0})\ge \{\bar{\beta }(\hat{w})-\beta \}' \textbf{X}'\textbf{X}\{\bar{\beta }(\hat{w})-\beta \} -2\zeta '\textbf{X}\{\bar{\beta }(\hat{w})-\beta \} +\hat{w}'\Omega \hat{w}. \end{aligned}$$
(A.29)

Let \(\Psi _{n}=n^{-1}\textbf{X}'\textbf{X}\). From (A.29), we obtain

$$\begin{aligned}&\underline{\lambda }(\Psi _{n})\Vert \sqrt{n}\{\bar{\beta }(\hat{w}) -\beta \}\Vert ^{2}\le \{\bar{\beta }(\hat{w})-\beta )\}\textbf{X}' \textbf{X}\{\bar{\beta }(\hat{w})-\beta \}\\&\quad \le \eta _{n}(w_{k}^{0})+2\zeta '\textbf{X} \{\bar{\beta }(\hat{w})-\beta \}-\hat{w}'\Omega \hat{w}\\&\quad \le \eta _{n}(w_{k}^{0})+2 \Vert n^{-1/2} \zeta '\textbf{X}\Vert \Vert \sqrt{n}\{\bar{\beta }(\hat{w})-\beta \}\Vert -\hat{w}'\Omega \hat{w}, \end{aligned}$$

and so

$$\begin{aligned}&\underline{\lambda }(\Psi _{n})\left[ \Vert \sqrt{n}\{\bar{\beta } (\hat{w})-\beta \}\Vert -\underline{\lambda }^{-1}(\Psi _{n}) \Vert n^{-1/2} \zeta '\textbf{X}\Vert \right] ^{2}\nonumber \\&\quad \le \eta _{n}(w_{k}^{0})+\{\underline{\lambda } (\Psi _{n})\}^{-1}\Vert n^{-1/2} \zeta {'}\textbf{X}\Vert ^{2} -\hat{w}'\Omega \hat{w}. \end{aligned}$$
(A.30)

Write \(a_{n}=\eta _{n}(w_{k}^{0})+\{\underline{\lambda } (\Psi _{n})\}^{-1}\Vert n^{-1/2} \zeta {'}\textbf{X} \Vert ^{2}-\hat{w}'\Omega \hat{w},\) then (A.30) equals to

$$\begin{aligned} \Vert \sqrt{n}\{\bar{\beta }(\hat{w})-\beta \} \Vert&\in \Big [- \{\underline{\lambda }^{-1}(\Psi _{n})a_{n}\}^{1/2} +\underline{\lambda }^{-1}(\Psi _{n})\Vert n^{-1/2} \zeta {'} \textbf{X}\Vert {,} \\&\qquad \{\underline{\lambda }^{-1}(\Psi _{n})a_{n}\}^{1/2} +\underline{\lambda }^{-1}(\Psi _{n})\Vert n^{-1/2} \zeta {'}\textbf{X}\Vert \Big ], \end{aligned}$$

which, together with (A.27), (A.28) and \(\textbf{X}'\zeta =O_{p}(n^{1/2})\), means \(\sqrt{n}\{\bar{\beta }(\hat{w})-\beta \}=O_{p}(1)\). This completes the proof. \(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, J., Cheng, W. & Hu, G. Jackknife model averaging for linear regression models with missing responses. J. Korean Stat. Soc. (2024). https://doi.org/10.1007/s42952-024-00259-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42952-024-00259-2

Keywords

Mathematics Subject Classification

Navigation