Skip to main content
Log in

Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models

  • Research
  • Published:
Networks and Spatial Economics Aims and scope Submit manuscript

Abstract

This paper is concerned with spatial single-index autoregressive model (SSIM), where the spatial lag effect enters the model linearly and the relationship between variables is a nonparametric function of a linear combination of multivariate regressors. It addresses challenges related to the curse of dimensionality and interactions among non-independent variables in spatial data. The local Walsh-average regression has proven to be a robust and efficient method for handling single-index models. We extend this approach to the spatial domain, propose a regularized local Walsh-average (RLWA) estimation strategy where the nonparametric component is established by a local Walsh-average approach and the estimation of the parametric part by Walsh-average method. Under specific assumptions, we establish the asymptotic properties of both parametric and nonparametric partial estimators. Additionally, we propose a robust shrinkage method termed regularized local Walsh-average (RLWA) that can construct robust parametric variable selection and robust nonparametric component estimation simultaneously. Theoretical analysis reveals RLWA works beautifully, including consistency in variable selection and oracle property in estimation. We propose a parameter selection process based on a robust BIC-type approach with an oracle property. The effectiveness of the proposed estimation procedure is evaluated through three Monte Carlo simulations and real data applications, demonstrating its performance in finite samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availibility

The data that support the findings of this study are available from the corresponding author upon request.

References

  • Basile R (2008) Regional economic growth in europe: a semiparametric spatial dependence approach. Pap Reg Sci 87(4):527–544

    Article  Google Scholar 

  • Carroll RJ, Fan J, Gijbels I, Wand MP (1997) Generalized partially linear single-index models. J Am Stat Assoc 92:477–489

    Article  MathSciNet  Google Scholar 

  • Delecroix M, Hristache M, Patilea V (2006) On semiparametric estimation in single-index regression. J Stat Plan Inference 136:730–769

    Article  MathSciNet  Google Scholar 

  • Du J, Sun X, Cao R, Zhang Z (2018) Statistical inference for partially linear additive spatial autoregressive models. Spat Stat 25:52–67

    Article  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  Google Scholar 

  • Fan Y, Härdle WK, Wang W, Zhu L (2018) Single-index-based CoVaR with very high-dimensional covariates. J Bus Econ Stat 36(2):212–226

  • Feng L, Zou C, Wang Z (2012) Local walsh-average regression. J Multivar Anal 106:36–48

    Article  MathSciNet  Google Scholar 

  • Hettmansperger TP, McKean JW (2011) Robust nonparametric statistical methods, 2nd edn. Chapman-Hall, New York

    Google Scholar 

  • Kelejian HH, Prucha IR (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J Real Estate Finance Econ 17:99–121

    Article  Google Scholar 

  • Lee LF (2007) Gmm and 2sls estimation of mixed regressive, spatial autoregressive models. J Econom 137:489–514

    Article  MathSciNet  Google Scholar 

  • Liu X, Chen J, Cheng S (2018) A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat Stat 25:86–104

    Article  MathSciNet  Google Scholar 

  • Peng H, Huang T (2011) Penalized least squares for single index models. J Stat Plan Inference 141:1362–1379

    Article  MathSciNet  Google Scholar 

  • Roger A (1985) Horn and Charles R. Johnson, Matrix analysis, In Statistical Inference for Engineers and Data Scientists

    Google Scholar 

  • Shang S, Zou C, Wang Z (2012) Local walsh-average regression for semiparametric varying-coefficient models. Stat Probab Lett 82:1815–1822

    Article  MathSciNet  Google Scholar 

  • Song Y, Li Z, Fang M (2022) Robust variable selection based on penalized composite quantile regression for high-dimensional single-index models. Mathematics 10(12):2000

  • Su L (2012) Semiparametric gmm estimation of spatial autoregressive models. J Econom 167:543–560

    Article  MathSciNet  Google Scholar 

  • Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157:18–33

    Article  MathSciNet  Google Scholar 

  • Su L, Yang Z (2009) Instrumental variable quantile estimation of spatial autoregressive models. Working Papers, Singapore Management University: Singapore

  • Sun Y (2017) Estimation of single-index model with spatial interaction. Reg Sci Urban Econ 62:36–45

    Article  Google Scholar 

  • Terpstra J, McKean JW (2005) Rank-based analysis of linear models using r. J Stat Softw 14:1–26

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58:267–288

    MathSciNet  Google Scholar 

  • Wang H (2009) Bayesian estimation and variable selection for single index models. Comput Stat Data Anal 53:2617–2627

    Article  MathSciNet  ADS  Google Scholar 

  • Wang HJ, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Ann Stat 37:3841–3866

    Article  MathSciNet  Google Scholar 

  • Wang L (2009) Wilcoxon-type generalized bayesian information criterion. Biometrika 96:163–173

    Article  MathSciNet  Google Scholar 

  • Wang L, Kai B, Li R (2009) Local rank inference for varying coefficient models. J Am Stat Assoc 104:1631–1645

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Wang L, Yang L (2006) Spline-backfitted kernel smoothing of nonlinear additive autoregression model. Ann Stat 35:2474–2503

    MathSciNet  Google Scholar 

  • Wu TZ, Yu K, Yu Y (2010) Single-index quantile regression. J Multivar Anal 101:1607–1621

    Article  Google Scholar 

  • Xie T, Cao R, Jiang D (2020) Variable selection for spatial autoregressive models with a diverging number of parameters. Stat Pap 61:1125–1145

    Article  MathSciNet  Google Scholar 

  • Xia Y, Härdle WK (2006) Semi-parametric estimation of partially linear single-index models. J Multivar Anal 97:1162–1184

    Article  MathSciNet  Google Scholar 

  • Yang J, Lu F, Yang H (2019) Local walsh-average-based estimation and variable selection for single-index models. Sci Chin Math 62:1977–1996

    Article  MathSciNet  Google Scholar 

  • Zeng P, He T, Zhu Y (2012) A lasso-type approach for estimation and variable selection in single index models. J Comput Graph Stat 21:109–92

    Article  MathSciNet  Google Scholar 

  • Zhao W, Jiang X, Lian H (2018) A principal varying-coefficient model for quantile regression: Joint variable selection and dimension reduction. Comput Stat Data Anal 127:269–280

    Article  MathSciNet  Google Scholar 

  • Zhao W, Lian H, Liang H (2017) Gee analysis for longitudinal single-index quantile regression. J Stat Plan Inference 187:78–102

    Article  MathSciNet  Google Scholar 

  • Zhao W, Zhou Y, Lian H (2018) Time-varying quantile single-index model for multivariate responses. Comput Stat Data Anal 127:32–49

    Article  MathSciNet  Google Scholar 

  • Zhu L, Qian L, Lin J (2011) Variable selection in a class of single-index models. Ann Inst Stat Math 63:1277–1293

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  CAS  Google Scholar 

  • Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533

    MathSciNet  PubMed  PubMed Central  Google Scholar 

Download references

Funding

The researches are supported by the Fundamental Research Funds for the Central Universities (No.23CX03012A), National Key Research and Development Program of China (2021YFA1000102).

Author information

Authors and Affiliations

Authors

Contributions

Yunquan Song: Data curation, Formal analysis, Software, Visualization, Writing - review and editing. Hang Su: Conceptualization, Software, Visualization, Writing - review and editing. Minmin Zhan: Language refinement and the inclusion of supplementary content. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yunquan Song.

Ethics declarations

Conflicts of Interest

The authors declare they have no financial interests.

Additional information

The researches are supported by the Fundamental Research Funds for the Central Universities (No.23CX03012A), National Key Research and Development Program of China (2021YFA1000102).

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Theorem 3.1 Let \(\tilde{\theta }\) be the initial estimate of \(\theta _0\). For the sake of proof, we define the following notations:

$$\begin{aligned} \begin{aligned}&\delta _n=n^{-1 / 2}, \theta ^*=\delta _n^{-1}\left( \theta -\theta _0\right) , V_{i j}= \hat{g}^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) X_i+ \hat{g}^{\prime }\left( X_j^{\textrm{T}} \tilde{\beta }\right) X_j, \\&\Delta _i= g\left( X_i^{\textrm{T}} \beta _0\right) - \hat{g}\left( X_i^{\textrm{T}} \tilde{\beta }\right) - \hat{g}^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) X_i^{\textrm{T}}\left( \beta _0-\tilde{\beta }\right) . \end{aligned} \end{aligned}$$

With these notations, we rewrite \(\Phi _n(\theta )\) as follows:

$$\begin{aligned} \Phi _n^*\left( \theta ^*\right) =\frac{1}{2 n(n+1)} \sum _{i \le j}\left| \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j-\delta _n V_{i j}^{\textrm{T}} \theta ^*\right| . \end{aligned}$$

Denote by \(S_n^*\left( \theta ^*\right)\) the gradient function of \(\Phi _n^*\left( \theta ^*\right)\), it follows that

$$\begin{aligned} S_n^*\left( \theta ^*\right) =\frac{\partial \Phi _n^*\left( \theta ^*\right) }{\partial \theta ^*}=\frac{-\delta _n}{2 n(n+1)} \sum _{i \le j} {\text {sgn}}\left( \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j-\delta _n V_{i j}^{\textrm{T}} \theta ^*\right) V_{i j}. \end{aligned}$$

Write \(U_n \triangleq \delta _n^{-1}\left( S_n^*\left( \theta ^*\right) -S_n^*(0)\right)\), we show that

$$\begin{aligned} \begin{aligned} U_n=&\frac{1}{2 n(n+1)} \sum _{i \leqslant j} {\text {sgn}}\left( \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j\right) V_{i j} \\&-\frac{1}{2 n(n+1)} \sum _{i \leqslant j} {\text {sgn}}\left( \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j-\delta _n V_{i j}^{\textrm{T}} \theta ^*\right) V_{i j} \\ =&\frac{1}{2 n(n+1)} \sum _{i \leqslant j} H_n\left( P_i, P_j\right) , \end{aligned} \end{aligned}$$

where

$$\begin{aligned} H_n\left( P_i, P_j\right) = \left\{ {\text {sgn}}\left( \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j\right) \right. \left. -{\text {sgn}}\left( \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j-\delta _n V_{i j}^{\textrm{T}} \theta ^*\right) \right\} V_{i j} . \end{aligned}$$

Note that \(H_n\left( P_i, P_j\right)\) is symmetric in its arguments, i.e., \(H_n\left( P_i, P_j\right) =H_n\left( P_j, P_i\right)\). Taking into account the fact that

$$\begin{aligned} E\left\{ \left\| H_n\left( P_i, P_j\right) \right\| ^2\right\} \leqslant 4 \textrm{E}\left\{ V_{i j} V_{i j}^{\textrm{T}}\right\} =O(1)=o(n). \end{aligned}$$

By Lemma A.1 of Wang et al. (2009), it can be shown that \(U_n=E\left\{ H_n\left( P_i, P_i\right) \right\} +o_n(1)\). Further combining the \(\sqrt{n}\)-consistency assumption of \(\theta\) with the conditions (C.3.1)-(C.3.6), it is not difficult to derive the following result:

$$\begin{aligned} E\left\{ H_n\left( P_i, P_j\right) \right\} =E\left\{ E\left\{ H_n\left( P_i, P_j\right) \right\} \right\} =2 \tau \delta _n \Sigma \theta ^*\left( 1+o_p(1)\right) . \end{aligned}$$

Consequently, we can obtain that

$$\begin{aligned} U_n=\delta _n^{-1}\left( S_n^*\left( \theta ^*\right) -S_n^*(0)\right) =2 \tau \delta _n \Sigma \theta ^*+o_p(1). \end{aligned}$$
(A.1)

The quadratic function can be defined as

$$\begin{aligned} R_n^*\left( \theta ^*\right) \triangleq \delta _n^{-1} \theta ^{* \mathrm {~T}} S_n^*(0)+\frac{1}{2} \tau \delta _n \theta ^{* \mathrm {~T}}(2 \Sigma ) \theta ^*+\delta _n^{-1} \Phi _n^*(0). \end{aligned}$$

Then, for any \(\epsilon >0\) and \(c>0\), we can get

$$\begin{aligned} \textrm{P}\left\{ \sup _{\left\| \theta ^*\right\| \leqslant c}\left| \delta _n^{-1} \Phi _n^*\left( \theta ^*\right) -R_n^*\left( \theta ^*\right) \right| \geqslant \epsilon \right\} \rightarrow 0 \end{aligned}$$
(A.2)

In fact, in view of Eq. (A.1), we have

$$\begin{aligned} \begin{aligned} \nabla \left( \delta _n^{-1} \Phi _n^*\left( \theta ^*\right) -R_n^*\left( \theta ^*\right) \right)&=\delta _n^{-1} S_n^*\left( \theta ^*\right) -\delta _n^{-1} S_n^*(0)-2 \delta _n \tau \Sigma \theta ^* \\&=\delta _n^{-1}\left( S_n^*\left( \theta ^*\right) -S_n^*(0)\right) -2 \delta _n \tau \Sigma \theta ^*=o_p(1) \end{aligned} \end{aligned}$$

Therefore, following the same lines of Theorem A.3.7 in Hettmansperger and McKean (2011), we have (A.2) holds Next, we specify \(\hat{\theta }^*\) and \(\bar{\theta }^*\) as the minimizers of \(\Phi _n^*\left( \theta ^*\right)\) and \(R_n^*\left( \theta ^*\right)\), respectively. Under the conditions (C.3.1)-(C.3.6), following some similar analysis of Theorem 3.5.5 in Hettmansperger and McKean (2011), we can show that

$$\begin{aligned} \hat{\theta }^*=\bar{\theta }^*+o_p(1) . \end{aligned}$$
(A.3)

By the expression of \(R_n^*\left( \theta ^*\right)\), it suffices to show that

$$\begin{aligned} \bar{\theta }^*=-\frac{1}{2} \delta _n^{-2} \tau ^{-1} \Sigma ^{-1} S_n^*(0) \end{aligned}$$
(A.4)

On the other hand, let \(S_n(0) \triangleq -\delta _n^{-2} S_n^*(0)\). Then we have

$$\begin{aligned} \begin{aligned} S_n(0)&=\frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \leqslant j} {\text {sgn}}\left( \varepsilon _i+\Delta _i+\varepsilon _j+\Delta _j\right) V_{i j} \\&=\frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \leqslant j}\left\{ {\text {sgn}}\left( \varepsilon _i+\Delta _i+\varepsilon _j+\Delta _j\right) -{\text {sgn}}\left( \varepsilon _i+\varepsilon _j\right) +{\text {sgn}}\left( \varepsilon _i+\varepsilon _j\right) \right\} V_{i j} \\&\triangleq S_{n 1}(0)+S_{n 2}(0), \end{aligned} \end{aligned}$$
(A.5)

where

$$\begin{aligned} \begin{aligned} S_{n 1}(0)&=\frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \le j} {\text {sgn}}\left( \varepsilon _i+\varepsilon _j\right) V_{i j} \triangleq \frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \leqslant j} H_{n 1}\left( P_i, P_j\right) \\ S_{n 2}(0)&=\frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \leqslant j}\left[ {\text {sgn}}\left( \varepsilon _i+\Delta _i+\varepsilon _j+\Delta _j\right) -{\text {sgn}}\left( \varepsilon _i+\varepsilon _j\right) \right] V_{i j} \\&\triangleq \frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \leqslant j} H_{n 2}\left( P_i, P_j\right) . \end{aligned} \end{aligned}$$

Similarly, we can verify that \(\textrm{E}\left\{ \left\| H_{n 1}\left( P_i, P_j\right) \right\| ^2\right\} =o(n)\) and \(\textrm{E}\left\{ H_{n 1}\left( P_i, P_j\right) \right\} =0\). Therefore, we obtain from Lemma A.1 of Wang et al. (2009) that

$$\begin{aligned} S_{n 1}^*(0)=2 n^{-1 / 2} \sum _{i=1}^n \textrm{E}\left\{ H_{n 1}\left( P_i, P_j\right) \mid P_i\right\} +o_p(1) \triangleq 2 n^{-1 / 2} \sum _{i=1}^n T_n\left( P_i\right) +o_p(1) . \end{aligned}$$

Under the \(\sqrt{n}\)-consistency assumption of \(\tilde{\beta }\), the conditions (C.3.1)-(C.3.6), and some calculations similar to those used in the proof of Theorem 3.2 in Wang et al. (2009), we have

$$\begin{aligned} E\left\{ T_n\left( P_i\right) T_n\left( P_i\right) ^{\textrm{T}}\right\} \rightarrow \frac{1}{4} E\left\{ \left[ g^{\prime }\left( X^{\textrm{T}} \beta _0\right) \right] ^2[2 H(\varepsilon )-1]^2 \tilde{X} \tilde{X}^{\textrm{T}}\right\} \end{aligned}$$

Then by Linderberg-Feller central limit theorem, we have

$$\begin{aligned} S_{n 1}(0) {\mathop {\rightarrow }\limits ^{d}} N\left( 0, E\left\{ \left[ g^{\prime }\left( X^{\textrm{T}} \beta _0\right) \right] ^2[2 H(\varepsilon )-1]^2 \tilde{X} \tilde{X}^{\textrm{T}}\right\} \right) . \end{aligned}$$
(A.6)

For the term \(S_{n 2}^*(0)\), since

$$\begin{aligned} \begin{aligned} \Delta _i=&g\left( X_i^{\textrm{T}} \beta _0\right) -\hat{g}\left( X_i^{\textrm{T}} \tilde{\beta }\right) -\hat{g}^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) X_i^{\textrm{T}}\left( \beta _0-\tilde{\beta }\right) \\ =&g\left( X_i^{\textrm{T}} \tilde{\beta }\right) -g^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) X_i^{\textrm{T}}\left( \beta _0-\tilde{\beta }\right) +o_p\left( \left( X_i^{\textrm{T}} \beta _0-X_i^{\textrm{T}} \tilde{\beta }\right) ^2\right) \\&-\hat{g}\left( X_i^{\textrm{T}} \tilde{\beta }\right) -\hat{g}^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) X_i^{\textrm{T}}\left( \beta _0-\tilde{\beta }\right) \\ =&{\left[ g\left( X_i^{\textrm{T}} \tilde{\beta }\right) -\hat{g}\left( X_i^{\textrm{T}} \tilde{\beta }\right) \right] -\left[ g^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) -\hat{g}^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) \right] X_i^{\textrm{T}}\left( \beta _0-\tilde{\beta }\right) } \\&+o_p\left( \left( X_i^{\textrm{T}} \beta _0-X_i^{\textrm{T}} \tilde{\beta }\right) ^2\right) . \end{aligned} \end{aligned}$$

For the term \(S_{n 2}(0)\), from the results of Theorem 1, condition (C.3.5) and the bandwidth assumption, we have \(\Delta _i=o_p(1)\). Similarly, \(\Delta _j=o_p(1)\) also holds. Then, it is not difficult to show that

$$\begin{aligned} S_{n 2}(0)=o_p(1) . \end{aligned}$$
(A.7)

Consequently, from (A.3)-(A.7), we finish the proof.

Proof of Theorem 3.2 We have \(\hat{\theta }-\theta _0=O_p\left( n^{-1 / 2}\right)\) from Theorem3.1.From the conditions that X are bounded support and we can derive the asymptotic normality of \(\hat{g}(u,\beta )\) from the framework of varying coefficient model. Therefor, the proof is similar to the proof of Theorem 3 in Shang et al. (2012), we omit it here.

Proof of Theorem 4.1 We prove introduce the following lemma to shows that the penalized RLWA owns the sparsity property \(\hat{\beta }_{I I}^\lambda =0\).

Lemma A.1 Under the conditions (C.3.1)-(C.3.6) given in the Section 3, if \(\lambda \rightarrow 0\) and \(\sqrt{n} \lambda \rightarrow \infty\) as \(n \rightarrow \infty\), then, with probability tending to 1 , for any constant \(C>0\), we have

$$\begin{aligned} Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \varvec{o}^{\textrm{T}}\right) ^{\textrm{T}}\right) =\min _{\left\| \beta _{I I}\right\| \leqslant C n^{-1 / 2}} Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right) \end{aligned}$$
(A.8)

Proof. Using the similar notation in the proof of Theorem 3.1, let \(\delta _1=c_n^{-1}\left( \beta _I-\beta _{0 I}\right) , \delta _2=c_n^{-1}\left( \beta _{I I}-\beta _{0 I I}\right)\) and \(\delta =\left( \delta _1^{\textrm{T}}, \delta _2^{\textrm{T}}\right) ^{\textrm{T}}\). From the expression of \(Q_\lambda (\beta )\), it is easy to see that

$$\begin{aligned} \begin{aligned} Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) -Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right)&=L_n\left( \left( \beta _I^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) -L_n\left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right) -\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right) \\&=L_n^*\left( \left( \delta _1^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) -L_n^*\left( \left( \delta _1^{\textrm{T}}, \delta _2^{\textrm{T}}\right) ^{\textrm{T}}\right) -\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right) \\&=c_n R_n\left( \left( \delta _1^{\textrm{T}}, 0^{\textrm{T}}\right) ^{\textrm{T}}\right) -c_n R_n\left( \left( \delta _1^{\textrm{T}}, \delta _2^{\textrm{T}}\right) ^{\textrm{T}}\right) -\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right) \\&=\left\{ \left( \left( \delta _1^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) ^{\textrm{T}} S_n(0)+\tau c_n^2\left( \left( \delta _1^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) ^{\textrm{T}} \Sigma \left( \left( \delta _1^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) \right\} \\&-\left( \delta ^{\textrm{T}} S_n(0)+\tau c_n^2 \delta ^{\textrm{T}} \Sigma \delta \right) -\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right) \\ \triangleq&D_1(\delta )-D_2(\delta )-\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right) , \end{aligned} \end{aligned}$$
(A.9)

We derive the third equation from (A.2) and the fourth equality holds by defining \(R_n\left( \beta ^*\right)\). Further more, by combing (A.6) and (A.7), we have \(S_n^*(0)=O_p(1)\). This means \(S_n(0)=O_p\left( c_n^2\right)\) because \(S_n^*(0)=-c_n^{-2} S_n(0)\). Note that \(\Vert \delta \Vert =O_p(1)\); it follows that \(D_1(\delta )=O_p\left( c_n^2\right)\). Similarly, we can verify that \(D_2(\delta )=O_p\left( c_n^2\right)\). For the last term of Eq. (A.9), since \(p_{\lambda }(0)=0\), using the mean value theorem, we can get

$$\begin{aligned} \begin{aligned} \sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right)&=\sum _{k=d+1}^p p_\lambda ^{\prime }\left( \left| \xi _k\right| \right) \left| \xi _k\right| \geqslant c_n^2 \sqrt{\log (n)} \sqrt{n / \log (n)} \sqrt{n} \lambda \liminf _{n \rightarrow \infty } \liminf _{t \rightarrow 0^{+}} p_\lambda ^{\prime }\left( \xi _k\right) / \lambda \cdot \left| \xi _k\right| , \end{aligned} \end{aligned}$$

where \(\left| \xi _k\right| \in \left( 0,\left| \beta _k\right| \right)\) for \(k=d+1, d+2, \ldots , p\). considering the assumption \(\sqrt{n} \lambda \rightarrow \infty\), with the fact that \(\sqrt{\log (n)} \rightarrow \infty\) and \(\sqrt{n / \log (n)} \rightarrow \infty\) as \(n \rightarrow \infty\), it’s clearly to see that \(\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right)\) is of a higher order compared with \(O_p\left( c_n^2\right)\). That is, \(Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) -Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right)\) is dominated by the negative term \(-\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right)\) for large n. Above all, we complete the proof of Lemma A.1 and we continue to prove Theorem 4.1. Lemma A.1 shows that (i) of the theorem is obviously true. Then, using the same notation in the proof of Theorem 3.1, we focus on proving (ii). As the conditions set in Theorem 4.1, for any given \(\beta _I\) satisfying \(\left\| \beta _I-\beta _{0 I}\right\| =O_p\left( n^{-1 / 2}\right)\), from Lemma A.1, we have

$$\begin{aligned} Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) =\min _{\left\| \beta _{I I}\right\| \leqslant C n^{-1 / 2}} Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right) \end{aligned}$$

Consider \(Q_\lambda (\beta )=L_n(\beta )+\sum _{k=1}^p p_\lambda \left( \left| \beta _k\right| \right)\), and then the following canonical equation must be satisfied:

$$\begin{aligned} \left. \frac{\partial Q_\lambda (\beta )}{\partial \beta _k}\right| _{\left( \left( \hat{\beta }_I^\lambda \right) ^{\textrm{T}}, 0^{\textrm{T}}\right) ^{\textrm{T}}}=0 \quad \text{ for } \quad k=1, \ldots , d . \end{aligned}$$
(A.10)

As \(L_n(\beta )\) can be written as \(L_n^*\left( \beta ^*\right)\) with \(\beta ^*=\sqrt{n}\left( \beta -\beta _0\right)\), then

$$\begin{aligned} \frac{\partial L_n(\beta )}{\partial \beta }=\frac{\partial L_n^*\left( \beta ^*\right) }{\partial \beta }=\frac{\partial L_n^*(\beta )}{\partial \beta ^*} \cdot \frac{\partial \beta ^*}{\partial \beta }=\sqrt{n} S_n\left( \beta ^*\right) \end{aligned}$$

In addition, through Eq. (A.1), we can easily find that \(\sqrt{n} S_n\left( \beta ^*\right) =\sqrt{n} S_n(0)+2 \tau c_n \Sigma \beta ^*\). Let \(e_k\) be a d-dimensional vector with the k-th component being equal to 1 and the other components being equal to 0, and \(S_{n d}(0)\) be a vector including the first d components of \(S_n(0)\). Through calculation, we can prove that Eq. (A.10) is equal to

$$\begin{aligned} \begin{aligned} 0=&\left. \frac{\partial L_n(\beta )}{\partial \beta _k}\right| _{\beta =\left( \left( \hat{\beta }_I^\lambda \right) ^{\textrm{T}} \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}}+p_\lambda ^{\prime }\left( \left| \hat{\beta }_k^\lambda \right| \right) \cdot {\text {sgn}}\left( \beta _k^\lambda \right) \\ =&\left. \varvec{e}_k^{\textrm{T}}\left( \sqrt{n} S_n(0)+2 \tau c_n \Sigma \beta ^*\right) \right| _{\beta =\left( \left( \hat{\beta }_I^\lambda \right) ^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}}+p_\lambda ^{\prime }\left( \left| \hat{\beta }_k^\lambda \right| \right) \cdot {\text {sgn}}\left( \beta _k^\lambda \right) \\ =&\varvec{e}_k^{\textrm{T}} \sqrt{n} S_{n d}(0)+2 \tau c_n \Sigma _d \sqrt{n}\left( \hat{\beta }_I^\lambda -\beta _{0 I}\right) +p_\lambda ^{\prime }\left( \left| \beta _{0 k}\right| \right) \cdot {\text {sgn}}\left( \beta _{0 k}\right) \\&+\left( p_\lambda ^{\prime \prime }\left( \left| \beta _{0 k}\right| \right) +o_p(1)\right) \left( \hat{\beta }_k^\lambda -\beta _{0 k}\right) , \end{aligned} \end{aligned}$$

where the last equation is obtained by Taylor expansion, and \(\Sigma _d\) is defined in Theorem 4.1. Note that \(c_n=n^{-1 / 2}\); it follows that

$$\begin{aligned} c_n^{-2} \varvec{e}_k^{\textrm{T}} S_{n d}(0)+2 \tau \varvec{e}_k^{\textrm{T}} \Sigma _d \sqrt{n}\left( \hat{\beta }_I^\lambda -\beta _{0 I}\right) +\sqrt{n} p_\lambda ^{\prime }\left( \left| \beta _{0 k}\right| \right) \cdot {\text {sgn}}\left( \beta _{0 k}\right) +p_\lambda ^{\prime \prime }\left( \left| \beta _{0 k}\right| \right) \sqrt{n}\left( \hat{\beta }_k^\lambda -\beta _{0 k}\right) +o_p(1) . \end{aligned}$$

Eventually, by defining c and \(\Lambda\), we can get the following equation

$$\begin{aligned} \left( \Sigma _d+\Lambda / 2\right) \cdot \sqrt{n}\left( \hat{\beta }_I^\lambda -\beta _{0 I}+\left( \Sigma _d+\Lambda / 2\right) ^{-1} c\right) =\frac{-1}{2 \tau } c_n^{-2} S_{n d}(0)=\frac{1}{2 \tau } S_{n d}^*(0) . \end{aligned}$$

Recall that \(S_n^*(0) {\mathop {\rightarrow }\limits ^{d}} N\left( 0, \textrm{E}\left\{ g^{\prime }\left( X^{\textrm{T}} \beta _0\right) ^2[2 M(\varepsilon )-1]^2 \tilde{X} \tilde{X}^{\textrm{T}}\right\} \right)\) from the proof of Theorem 3.1. By the central limit theorem and Slutsky’s theorem, we complete the proof.

Proof of Theorem 4.3 We learn from theorem 4.1 that the penalized estimate \(\hat{\beta }^{\lambda }\) keep similar properties with the oracle estimate and \(\textrm{P}\left( \textrm{BIC}_{\lambda _n}=\textrm{BIC}_{S_T}\right) \rightarrow 1\). This means that BIC can correctly select the optimal tuning parameter \(\lambda\) that can identify the real model asymptotically. Therefore, there is only one best \(\lambda\) and its corresponding estimators produce real models.Next, we are going to prove \(\textrm{P}\left( \inf _{\lambda \in O_{-} \cup O_{+}} \textrm{BIC}_\lambda >\textrm{BIC}_{\lambda _n}\right) \rightarrow 1\), where \(O_{-}\)and \(O_{+}\)stand for the underfitted case and overfitted case, respectively.

The underfitted case We consider that at least one covariate is missing in the real model. For any \(\lambda \in O_{-}\)satisfying \(S_\lambda \nsupseteq S_T\), by virtue of the assumption (A2), we have \(L_n^{S_\lambda } \geqslant L_n^{S_T}\). Recall that the definitions of \(L_n^S\) and \(\textrm{BIC}_\lambda\); we can verify that

$$\begin{aligned} \begin{aligned} \textrm{BIC}_\lambda&=\frac{12 \hat{\tau }^2}{n^2} \sum _{i \leqslant j}\left| Y_i- \rho W Y_i- \hat{g}\left( X_i^{\textrm{T}} \hat{\beta }^\lambda \right) +Y_j- \rho W Y_j- \hat{g}\left( X_j^{\textrm{T}} \hat{\beta }^\lambda \right) \right| +\frac{\log (n)}{n} d f_\lambda \\&\geqslant 12 \hat{\tau }^2 L_n^{S_\lambda }+\frac{\log (n)}{n} d f_\lambda \geqslant 12 \hat{\tau }^2 L_n^{S_T} . \end{aligned} \end{aligned}$$

Therefore, we have \(\textrm{P}\left( \inf _{\lambda \in O_{-}} \textrm{BIC}_\lambda >\textrm{BIC}_{\lambda _n}\right) \rightarrow 1\) holds.

The overfitted case We consider the model contains all the covariates in the true model and at least one covariate that does not belong to the true model. Let

$$\begin{aligned} \bar{L}_n(\beta )=\frac{1}{n} \sum _{i \leqslant j}\left| Y_i-\rho W Y_i-\hat{g}\left( X_i^{\textrm{T}} \beta \right) +Y_j-\rho W Y_j-\hat{g}\left( X_j^{\textrm{T}} \beta \right) \right| \end{aligned}$$

for any \(\lambda \in O_{+}\)satisfying \(S_\lambda \supset S_T\) but \(S_\lambda \ne S_T\). Then, we have

$$\begin{aligned} n\left( \textrm{BIC}_\lambda -\textrm{BIC}_{\lambda _n}\right) =12 \hat{\tau }^2\left( \bar{L}_n\left( \hat{\beta }^\lambda \right) -\bar{L}_n\left( \hat{\beta }^{\lambda _n}\right) \right) +\left( d f_\lambda -d f_{\lambda _n}\right) \cdot \log (n) \end{aligned}$$
(A.11)

We can learn from assumption (A1) that both \(\hat{\beta }^\lambda\) and \(\hat{\beta }^{\lambda _n}\) are \(\sqrt{n}\)-consistent, i.e., \(n\left\| \hat{\beta }^\lambda -\beta _0\right\| ^2=\) \(O_p(1)\) and \(n\left\| \hat{\beta }^{\lambda _n}-\beta _0\right\| ^2=O_p(1)\). Besides, we can verify that \(\bar{L}_n\left( \hat{\beta }^\lambda \right) =O_p(1)\) and \(\bar{L}_n\left( \hat{\beta }^{\lambda _n}\right) =O_p(1)\). Moveover, \(\textrm{P}\left( d f_\lambda -d f_{\lambda _n} \geqslant 1\right) \rightarrow 1\) as \(n \rightarrow \infty\); it follows that the right-hand side of (A.11) diverges to \(\infty\) with probability tending to 1 . Hence, \(\textrm{P}\left( \textrm{inf}_{\lambda \in O_{+}} \textrm{BIC} \textrm{C}_\lambda >\textrm{BIC}_{\lambda _n}\right) \rightarrow 1\). In general, we complete the proof by combining the two case.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, Y., Su, H. & Zhan, M. Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models. Netw Spat Econ (2024). https://doi.org/10.1007/s11067-024-09616-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11067-024-09616-4

Keywords

Navigation