Abstract
This paper is concerned with spatial single-index autoregressive model (SSIM), where the spatial lag effect enters the model linearly and the relationship between variables is a nonparametric function of a linear combination of multivariate regressors. It addresses challenges related to the curse of dimensionality and interactions among non-independent variables in spatial data. The local Walsh-average regression has proven to be a robust and efficient method for handling single-index models. We extend this approach to the spatial domain, propose a regularized local Walsh-average (RLWA) estimation strategy where the nonparametric component is established by a local Walsh-average approach and the estimation of the parametric part by Walsh-average method. Under specific assumptions, we establish the asymptotic properties of both parametric and nonparametric partial estimators. Additionally, we propose a robust shrinkage method termed regularized local Walsh-average (RLWA) that can construct robust parametric variable selection and robust nonparametric component estimation simultaneously. Theoretical analysis reveals RLWA works beautifully, including consistency in variable selection and oracle property in estimation. We propose a parameter selection process based on a robust BIC-type approach with an oracle property. The effectiveness of the proposed estimation procedure is evaluated through three Monte Carlo simulations and real data applications, demonstrating its performance in finite samples.
Similar content being viewed by others
Data Availibility
The data that support the findings of this study are available from the corresponding author upon request.
References
Basile R (2008) Regional economic growth in europe: a semiparametric spatial dependence approach. Pap Reg Sci 87(4):527–544
Carroll RJ, Fan J, Gijbels I, Wand MP (1997) Generalized partially linear single-index models. J Am Stat Assoc 92:477–489
Delecroix M, Hristache M, Patilea V (2006) On semiparametric estimation in single-index regression. J Stat Plan Inference 136:730–769
Du J, Sun X, Cao R, Zhang Z (2018) Statistical inference for partially linear additive spatial autoregressive models. Spat Stat 25:52–67
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan Y, Härdle WK, Wang W, Zhu L (2018) Single-index-based CoVaR with very high-dimensional covariates. J Bus Econ Stat 36(2):212–226
Feng L, Zou C, Wang Z (2012) Local walsh-average regression. J Multivar Anal 106:36–48
Hettmansperger TP, McKean JW (2011) Robust nonparametric statistical methods, 2nd edn. Chapman-Hall, New York
Kelejian HH, Prucha IR (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J Real Estate Finance Econ 17:99–121
Lee LF (2007) Gmm and 2sls estimation of mixed regressive, spatial autoregressive models. J Econom 137:489–514
Liu X, Chen J, Cheng S (2018) A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat Stat 25:86–104
Peng H, Huang T (2011) Penalized least squares for single index models. J Stat Plan Inference 141:1362–1379
Roger A (1985) Horn and Charles R. Johnson, Matrix analysis, In Statistical Inference for Engineers and Data Scientists
Shang S, Zou C, Wang Z (2012) Local walsh-average regression for semiparametric varying-coefficient models. Stat Probab Lett 82:1815–1822
Song Y, Li Z, Fang M (2022) Robust variable selection based on penalized composite quantile regression for high-dimensional single-index models. Mathematics 10(12):2000
Su L (2012) Semiparametric gmm estimation of spatial autoregressive models. J Econom 167:543–560
Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157:18–33
Su L, Yang Z (2009) Instrumental variable quantile estimation of spatial autoregressive models. Working Papers, Singapore Management University: Singapore
Sun Y (2017) Estimation of single-index model with spatial interaction. Reg Sci Urban Econ 62:36–45
Terpstra J, McKean JW (2005) Rank-based analysis of linear models using r. J Stat Softw 14:1–26
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58:267–288
Wang H (2009) Bayesian estimation and variable selection for single index models. Comput Stat Data Anal 53:2617–2627
Wang HJ, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Ann Stat 37:3841–3866
Wang L (2009) Wilcoxon-type generalized bayesian information criterion. Biometrika 96:163–173
Wang L, Kai B, Li R (2009) Local rank inference for varying coefficient models. J Am Stat Assoc 104:1631–1645
Wang L, Yang L (2006) Spline-backfitted kernel smoothing of nonlinear additive autoregression model. Ann Stat 35:2474–2503
Wu TZ, Yu K, Yu Y (2010) Single-index quantile regression. J Multivar Anal 101:1607–1621
Xie T, Cao R, Jiang D (2020) Variable selection for spatial autoregressive models with a diverging number of parameters. Stat Pap 61:1125–1145
Xia Y, Härdle WK (2006) Semi-parametric estimation of partially linear single-index models. J Multivar Anal 97:1162–1184
Yang J, Lu F, Yang H (2019) Local walsh-average-based estimation and variable selection for single-index models. Sci Chin Math 62:1977–1996
Zeng P, He T, Zhu Y (2012) A lasso-type approach for estimation and variable selection in single index models. J Comput Graph Stat 21:109–92
Zhao W, Jiang X, Lian H (2018) A principal varying-coefficient model for quantile regression: Joint variable selection and dimension reduction. Comput Stat Data Anal 127:269–280
Zhao W, Lian H, Liang H (2017) Gee analysis for longitudinal single-index quantile regression. J Stat Plan Inference 187:78–102
Zhao W, Zhou Y, Lian H (2018) Time-varying quantile single-index model for multivariate responses. Comput Stat Data Anal 127:32–49
Zhu L, Qian L, Lin J (2011) Variable selection in a class of single-index models. Ann Inst Stat Math 63:1277–1293
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
Funding
The researches are supported by the Fundamental Research Funds for the Central Universities (No.23CX03012A), National Key Research and Development Program of China (2021YFA1000102).
Author information
Authors and Affiliations
Contributions
Yunquan Song: Data curation, Formal analysis, Software, Visualization, Writing - review and editing. Hang Su: Conceptualization, Software, Visualization, Writing - review and editing. Minmin Zhan: Language refinement and the inclusion of supplementary content. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflicts of Interest
The authors declare they have no financial interests.
Additional information
The researches are supported by the Fundamental Research Funds for the Central Universities (No.23CX03012A), National Key Research and Development Program of China (2021YFA1000102).
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Theorem 3.1 Let \(\tilde{\theta }\) be the initial estimate of \(\theta _0\). For the sake of proof, we define the following notations:
With these notations, we rewrite \(\Phi _n(\theta )\) as follows:
Denote by \(S_n^*\left( \theta ^*\right)\) the gradient function of \(\Phi _n^*\left( \theta ^*\right)\), it follows that
Write \(U_n \triangleq \delta _n^{-1}\left( S_n^*\left( \theta ^*\right) -S_n^*(0)\right)\), we show that
where
Note that \(H_n\left( P_i, P_j\right)\) is symmetric in its arguments, i.e., \(H_n\left( P_i, P_j\right) =H_n\left( P_j, P_i\right)\). Taking into account the fact that
By Lemma A.1 of Wang et al. (2009), it can be shown that \(U_n=E\left\{ H_n\left( P_i, P_i\right) \right\} +o_n(1)\). Further combining the \(\sqrt{n}\)-consistency assumption of \(\theta\) with the conditions (C.3.1)-(C.3.6), it is not difficult to derive the following result:
Consequently, we can obtain that
The quadratic function can be defined as
Then, for any \(\epsilon >0\) and \(c>0\), we can get
In fact, in view of Eq. (A.1), we have
Therefore, following the same lines of Theorem A.3.7 in Hettmansperger and McKean (2011), we have (A.2) holds Next, we specify \(\hat{\theta }^*\) and \(\bar{\theta }^*\) as the minimizers of \(\Phi _n^*\left( \theta ^*\right)\) and \(R_n^*\left( \theta ^*\right)\), respectively. Under the conditions (C.3.1)-(C.3.6), following some similar analysis of Theorem 3.5.5 in Hettmansperger and McKean (2011), we can show that
By the expression of \(R_n^*\left( \theta ^*\right)\), it suffices to show that
On the other hand, let \(S_n(0) \triangleq -\delta _n^{-2} S_n^*(0)\). Then we have
where
Similarly, we can verify that \(\textrm{E}\left\{ \left\| H_{n 1}\left( P_i, P_j\right) \right\| ^2\right\} =o(n)\) and \(\textrm{E}\left\{ H_{n 1}\left( P_i, P_j\right) \right\} =0\). Therefore, we obtain from Lemma A.1 of Wang et al. (2009) that
Under the \(\sqrt{n}\)-consistency assumption of \(\tilde{\beta }\), the conditions (C.3.1)-(C.3.6), and some calculations similar to those used in the proof of Theorem 3.2 in Wang et al. (2009), we have
Then by Linderberg-Feller central limit theorem, we have
For the term \(S_{n 2}^*(0)\), since
For the term \(S_{n 2}(0)\), from the results of Theorem 1, condition (C.3.5) and the bandwidth assumption, we have \(\Delta _i=o_p(1)\). Similarly, \(\Delta _j=o_p(1)\) also holds. Then, it is not difficult to show that
Consequently, from (A.3)-(A.7), we finish the proof.
Proof of Theorem 3.2 We have \(\hat{\theta }-\theta _0=O_p\left( n^{-1 / 2}\right)\) from Theorem3.1.From the conditions that X are bounded support and we can derive the asymptotic normality of \(\hat{g}(u,\beta )\) from the framework of varying coefficient model. Therefor, the proof is similar to the proof of Theorem 3 in Shang et al. (2012), we omit it here.
Proof of Theorem 4.1 We prove introduce the following lemma to shows that the penalized RLWA owns the sparsity property \(\hat{\beta }_{I I}^\lambda =0\).
Lemma A.1 Under the conditions (C.3.1)-(C.3.6) given in the Section 3, if \(\lambda \rightarrow 0\) and \(\sqrt{n} \lambda \rightarrow \infty\) as \(n \rightarrow \infty\), then, with probability tending to 1 , for any constant \(C>0\), we have
Proof. Using the similar notation in the proof of Theorem 3.1, let \(\delta _1=c_n^{-1}\left( \beta _I-\beta _{0 I}\right) , \delta _2=c_n^{-1}\left( \beta _{I I}-\beta _{0 I I}\right)\) and \(\delta =\left( \delta _1^{\textrm{T}}, \delta _2^{\textrm{T}}\right) ^{\textrm{T}}\). From the expression of \(Q_\lambda (\beta )\), it is easy to see that
We derive the third equation from (A.2) and the fourth equality holds by defining \(R_n\left( \beta ^*\right)\). Further more, by combing (A.6) and (A.7), we have \(S_n^*(0)=O_p(1)\). This means \(S_n(0)=O_p\left( c_n^2\right)\) because \(S_n^*(0)=-c_n^{-2} S_n(0)\). Note that \(\Vert \delta \Vert =O_p(1)\); it follows that \(D_1(\delta )=O_p\left( c_n^2\right)\). Similarly, we can verify that \(D_2(\delta )=O_p\left( c_n^2\right)\). For the last term of Eq. (A.9), since \(p_{\lambda }(0)=0\), using the mean value theorem, we can get
where \(\left| \xi _k\right| \in \left( 0,\left| \beta _k\right| \right)\) for \(k=d+1, d+2, \ldots , p\). considering the assumption \(\sqrt{n} \lambda \rightarrow \infty\), with the fact that \(\sqrt{\log (n)} \rightarrow \infty\) and \(\sqrt{n / \log (n)} \rightarrow \infty\) as \(n \rightarrow \infty\), it’s clearly to see that \(\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right)\) is of a higher order compared with \(O_p\left( c_n^2\right)\). That is, \(Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) -Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right)\) is dominated by the negative term \(-\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right)\) for large n. Above all, we complete the proof of Lemma A.1 and we continue to prove Theorem 4.1. Lemma A.1 shows that (i) of the theorem is obviously true. Then, using the same notation in the proof of Theorem 3.1, we focus on proving (ii). As the conditions set in Theorem 4.1, for any given \(\beta _I\) satisfying \(\left\| \beta _I-\beta _{0 I}\right\| =O_p\left( n^{-1 / 2}\right)\), from Lemma A.1, we have
Consider \(Q_\lambda (\beta )=L_n(\beta )+\sum _{k=1}^p p_\lambda \left( \left| \beta _k\right| \right)\), and then the following canonical equation must be satisfied:
As \(L_n(\beta )\) can be written as \(L_n^*\left( \beta ^*\right)\) with \(\beta ^*=\sqrt{n}\left( \beta -\beta _0\right)\), then
In addition, through Eq. (A.1), we can easily find that \(\sqrt{n} S_n\left( \beta ^*\right) =\sqrt{n} S_n(0)+2 \tau c_n \Sigma \beta ^*\). Let \(e_k\) be a d-dimensional vector with the k-th component being equal to 1 and the other components being equal to 0, and \(S_{n d}(0)\) be a vector including the first d components of \(S_n(0)\). Through calculation, we can prove that Eq. (A.10) is equal to
where the last equation is obtained by Taylor expansion, and \(\Sigma _d\) is defined in Theorem 4.1. Note that \(c_n=n^{-1 / 2}\); it follows that
Eventually, by defining c and \(\Lambda\), we can get the following equation
Recall that \(S_n^*(0) {\mathop {\rightarrow }\limits ^{d}} N\left( 0, \textrm{E}\left\{ g^{\prime }\left( X^{\textrm{T}} \beta _0\right) ^2[2 M(\varepsilon )-1]^2 \tilde{X} \tilde{X}^{\textrm{T}}\right\} \right)\) from the proof of Theorem 3.1. By the central limit theorem and Slutsky’s theorem, we complete the proof.
Proof of Theorem 4.3 We learn from theorem 4.1 that the penalized estimate \(\hat{\beta }^{\lambda }\) keep similar properties with the oracle estimate and \(\textrm{P}\left( \textrm{BIC}_{\lambda _n}=\textrm{BIC}_{S_T}\right) \rightarrow 1\). This means that BIC can correctly select the optimal tuning parameter \(\lambda\) that can identify the real model asymptotically. Therefore, there is only one best \(\lambda\) and its corresponding estimators produce real models.Next, we are going to prove \(\textrm{P}\left( \inf _{\lambda \in O_{-} \cup O_{+}} \textrm{BIC}_\lambda >\textrm{BIC}_{\lambda _n}\right) \rightarrow 1\), where \(O_{-}\)and \(O_{+}\)stand for the underfitted case and overfitted case, respectively.
The underfitted case We consider that at least one covariate is missing in the real model. For any \(\lambda \in O_{-}\)satisfying \(S_\lambda \nsupseteq S_T\), by virtue of the assumption (A2), we have \(L_n^{S_\lambda } \geqslant L_n^{S_T}\). Recall that the definitions of \(L_n^S\) and \(\textrm{BIC}_\lambda\); we can verify that
Therefore, we have \(\textrm{P}\left( \inf _{\lambda \in O_{-}} \textrm{BIC}_\lambda >\textrm{BIC}_{\lambda _n}\right) \rightarrow 1\) holds.
The overfitted case We consider the model contains all the covariates in the true model and at least one covariate that does not belong to the true model. Let
for any \(\lambda \in O_{+}\)satisfying \(S_\lambda \supset S_T\) but \(S_\lambda \ne S_T\). Then, we have
We can learn from assumption (A1) that both \(\hat{\beta }^\lambda\) and \(\hat{\beta }^{\lambda _n}\) are \(\sqrt{n}\)-consistent, i.e., \(n\left\| \hat{\beta }^\lambda -\beta _0\right\| ^2=\) \(O_p(1)\) and \(n\left\| \hat{\beta }^{\lambda _n}-\beta _0\right\| ^2=O_p(1)\). Besides, we can verify that \(\bar{L}_n\left( \hat{\beta }^\lambda \right) =O_p(1)\) and \(\bar{L}_n\left( \hat{\beta }^{\lambda _n}\right) =O_p(1)\). Moveover, \(\textrm{P}\left( d f_\lambda -d f_{\lambda _n} \geqslant 1\right) \rightarrow 1\) as \(n \rightarrow \infty\); it follows that the right-hand side of (A.11) diverges to \(\infty\) with probability tending to 1 . Hence, \(\textrm{P}\left( \textrm{inf}_{\lambda \in O_{+}} \textrm{BIC} \textrm{C}_\lambda >\textrm{BIC}_{\lambda _n}\right) \rightarrow 1\). In general, we complete the proof by combining the two case.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Song, Y., Su, H. & Zhan, M. Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models. Netw Spat Econ (2024). https://doi.org/10.1007/s11067-024-09616-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s11067-024-09616-4