Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models

Song, Yunquan; Su, Hang; Zhan, Minmin

doi:10.1007/s11067-024-09616-4

Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models

Research
Published: 08 February 2024

(2024)
Cite this article

Networks and Spatial Economics Aims and scope Submit manuscript

Yunquan Song¹,
Hang Su¹ &
Minmin Zhan¹

75 Accesses
Explore all metrics

Abstract

This paper is concerned with spatial single-index autoregressive model (SSIM), where the spatial lag effect enters the model linearly and the relationship between variables is a nonparametric function of a linear combination of multivariate regressors. It addresses challenges related to the curse of dimensionality and interactions among non-independent variables in spatial data. The local Walsh-average regression has proven to be a robust and efficient method for handling single-index models. We extend this approach to the spatial domain, propose a regularized local Walsh-average (RLWA) estimation strategy where the nonparametric component is established by a local Walsh-average approach and the estimation of the parametric part by Walsh-average method. Under specific assumptions, we establish the asymptotic properties of both parametric and nonparametric partial estimators. Additionally, we propose a robust shrinkage method termed regularized local Walsh-average (RLWA) that can construct robust parametric variable selection and robust nonparametric component estimation simultaneously. Theoretical analysis reveals RLWA works beautifully, including consistency in variable selection and oracle property in estimation. We propose a parameter selection process based on a robust BIC-type approach with an oracle property. The effectiveness of the proposed estimation procedure is evaluated through three Monte Carlo simulations and real data applications, demonstrating its performance in finite samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection for spatial autoregressive models with a diverging number of parameters

Article 29 January 2018

A semiparametric dynamic higher-order spatial autoregressive model

Article 20 September 2023

Bayesian analysis of partially linear, single-index, spatial autoregressive models

Article 02 July 2021

Data Availibility

The data that support the findings of this study are available from the corresponding author upon request.

References

Basile R (2008) Regional economic growth in europe: a semiparametric spatial dependence approach. Pap Reg Sci 87(4):527–544
Article Google Scholar
Carroll RJ, Fan J, Gijbels I, Wand MP (1997) Generalized partially linear single-index models. J Am Stat Assoc 92:477–489
Article MathSciNet Google Scholar
Delecroix M, Hristache M, Patilea V (2006) On semiparametric estimation in single-index regression. J Stat Plan Inference 136:730–769
Article MathSciNet Google Scholar
Du J, Sun X, Cao R, Zhang Z (2018) Statistical inference for partially linear additive spatial autoregressive models. Spat Stat 25:52–67
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MathSciNet Google Scholar
Fan Y, Härdle WK, Wang W, Zhu L (2018) Single-index-based CoVaR with very high-dimensional covariates. J Bus Econ Stat 36(2):212–226
Feng L, Zou C, Wang Z (2012) Local walsh-average regression. J Multivar Anal 106:36–48
Article MathSciNet Google Scholar
Hettmansperger TP, McKean JW (2011) Robust nonparametric statistical methods, 2nd edn. Chapman-Hall, New York
Google Scholar
Kelejian HH, Prucha IR (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J Real Estate Finance Econ 17:99–121
Article Google Scholar
Lee LF (2007) Gmm and 2sls estimation of mixed regressive, spatial autoregressive models. J Econom 137:489–514
Article MathSciNet Google Scholar
Liu X, Chen J, Cheng S (2018) A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat Stat 25:86–104
Article MathSciNet Google Scholar
Peng H, Huang T (2011) Penalized least squares for single index models. J Stat Plan Inference 141:1362–1379
Article MathSciNet Google Scholar
Roger A (1985) Horn and Charles R. Johnson, Matrix analysis, In Statistical Inference for Engineers and Data Scientists
Google Scholar
Shang S, Zou C, Wang Z (2012) Local walsh-average regression for semiparametric varying-coefficient models. Stat Probab Lett 82:1815–1822
Article MathSciNet Google Scholar
Song Y, Li Z, Fang M (2022) Robust variable selection based on penalized composite quantile regression for high-dimensional single-index models. Mathematics 10(12):2000
Su L (2012) Semiparametric gmm estimation of spatial autoregressive models. J Econom 167:543–560
Article MathSciNet Google Scholar
Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157:18–33
Article MathSciNet Google Scholar
Su L, Yang Z (2009) Instrumental variable quantile estimation of spatial autoregressive models. Working Papers, Singapore Management University: Singapore
Sun Y (2017) Estimation of single-index model with spatial interaction. Reg Sci Urban Econ 62:36–45
Article Google Scholar
Terpstra J, McKean JW (2005) Rank-based analysis of linear models using r. J Stat Softw 14:1–26
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58:267–288
MathSciNet Google Scholar
Wang H (2009) Bayesian estimation and variable selection for single index models. Comput Stat Data Anal 53:2617–2627
Article MathSciNet ADS Google Scholar
Wang HJ, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Ann Stat 37:3841–3866
Article MathSciNet Google Scholar
Wang L (2009) Wilcoxon-type generalized bayesian information criterion. Biometrika 96:163–173
Article MathSciNet Google Scholar
Wang L, Kai B, Li R (2009) Local rank inference for varying coefficient models. J Am Stat Assoc 104:1631–1645
Article MathSciNet PubMed PubMed Central Google Scholar
Wang L, Yang L (2006) Spline-backfitted kernel smoothing of nonlinear additive autoregression model. Ann Stat 35:2474–2503
MathSciNet Google Scholar
Wu TZ, Yu K, Yu Y (2010) Single-index quantile regression. J Multivar Anal 101:1607–1621
Article Google Scholar
Xie T, Cao R, Jiang D (2020) Variable selection for spatial autoregressive models with a diverging number of parameters. Stat Pap 61:1125–1145
Article MathSciNet Google Scholar
Xia Y, Härdle WK (2006) Semi-parametric estimation of partially linear single-index models. J Multivar Anal 97:1162–1184
Article MathSciNet Google Scholar
Yang J, Lu F, Yang H (2019) Local walsh-average-based estimation and variable selection for single-index models. Sci Chin Math 62:1977–1996
Article MathSciNet Google Scholar
Zeng P, He T, Zhu Y (2012) A lasso-type approach for estimation and variable selection in single index models. J Comput Graph Stat 21:109–92
Article MathSciNet Google Scholar
Zhao W, Jiang X, Lian H (2018) A principal varying-coefficient model for quantile regression: Joint variable selection and dimension reduction. Comput Stat Data Anal 127:269–280
Article MathSciNet Google Scholar
Zhao W, Lian H, Liang H (2017) Gee analysis for longitudinal single-index quantile regression. J Stat Plan Inference 187:78–102
Article MathSciNet Google Scholar
Zhao W, Zhou Y, Lian H (2018) Time-varying quantile single-index model for multivariate responses. Comput Stat Data Anal 127:32–49
Article MathSciNet Google Scholar
Zhu L, Qian L, Lin J (2011) Variable selection in a class of single-index models. Ann Inst Stat Math 63:1277–1293
Article MathSciNet Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MathSciNet CAS Google Scholar
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
MathSciNet PubMed PubMed Central Google Scholar

Download references

Funding

The researches are supported by the Fundamental Research Funds for the Central Universities (No.23CX03012A), National Key Research and Development Program of China (2021YFA1000102).

Author information

Authors and Affiliations

College of Science, China University of Petroleum, Qingdao, 266580, P.R. China
Yunquan Song, Hang Su & Minmin Zhan

Authors

Yunquan Song
View author publications
You can also search for this author in PubMed Google Scholar
Hang Su
View author publications
You can also search for this author in PubMed Google Scholar
Minmin Zhan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yunquan Song: Data curation, Formal analysis, Software, Visualization, Writing - review and editing. Hang Su: Conceptualization, Software, Visualization, Writing - review and editing. Minmin Zhan: Language refinement and the inclusion of supplementary content. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yunquan Song.

Ethics declarations

Conflicts of Interest

The authors declare they have no financial interests.

Additional information

The researches are supported by the Fundamental Research Funds for the Central Universities (No.23CX03012A), National Key Research and Development Program of China (2021YFA1000102).

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Theorem 3.1 Let $\tilde{\theta }$ be the initial estimate of $\theta _0$. For the sake of proof, we define the following notations:

$$\begin{aligned} \begin{aligned}&\delta _n=n^{-1 / 2}, \theta ^*=\delta _n^{-1}\left( \theta -\theta _0\right) , V_{i j}= \hat{g}^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) X_i+ \hat{g}^{\prime }\left( X_j^{\textrm{T}} \tilde{\beta }\right) X_j, \\&\Delta _i= g\left( X_i^{\textrm{T}} \beta _0\right) - \hat{g}\left( X_i^{\textrm{T}} \tilde{\beta }\right) - \hat{g}^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) X_i^{\textrm{T}}\left( \beta _0-\tilde{\beta }\right) . \end{aligned} \end{aligned}$$

With these notations, we rewrite $\Phi _n(\theta )$ as follows:

$$\begin{aligned} \Phi _n^*\left( \theta ^*\right) =\frac{1}{2 n(n+1)} \sum _{i \le j}\left| \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j-\delta _n V_{i j}^{\textrm{T}} \theta ^*\right| . \end{aligned}$$

Denote by $S_n^*\left( \theta ^*\right)$ the gradient function of $\Phi _n^*\left( \theta ^*\right)$, it follows that

$$\begin{aligned} S_n^*\left( \theta ^*\right) =\frac{\partial \Phi _n^*\left( \theta ^*\right) }{\partial \theta ^*}=\frac{-\delta _n}{2 n(n+1)} \sum _{i \le j} {\text {sgn}}\left( \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j-\delta _n V_{i j}^{\textrm{T}} \theta ^*\right) V_{i j}. \end{aligned}$$

Write $U_n \triangleq \delta _n^{-1}\left( S_n^*\left( \theta ^*\right) -S_n^*(0)\right)$, we show that

$$\begin{aligned} \begin{aligned} U_n=&\frac{1}{2 n(n+1)} \sum _{i \leqslant j} {\text {sgn}}\left( \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j\right) V_{i j} \\&-\frac{1}{2 n(n+1)} \sum _{i \leqslant j} {\text {sgn}}\left( \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j-\delta _n V_{i j}^{\textrm{T}} \theta ^*\right) V_{i j} \\ =&\frac{1}{2 n(n+1)} \sum _{i \leqslant j} H_n\left( P_i, P_j\right) , \end{aligned} \end{aligned}$$

where

$$\begin{aligned} H_n\left( P_i, P_j\right) = \left\{ {\text {sgn}}\left( \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j\right) \right. \left. -{\text {sgn}}\left( \varepsilon _i+\varepsilon _j+\Delta _i+\Delta _j-\delta _n V_{i j}^{\textrm{T}} \theta ^*\right) \right\} V_{i j} . \end{aligned}$$

Note that $H_n\left( P_i, P_j\right)$ is symmetric in its arguments, i.e., $H_n\left( P_i, P_j\right) =H_n\left( P_j, P_i\right)$. Taking into account the fact that

$$\begin{aligned} E\left\{ \left\| H_n\left( P_i, P_j\right) \right\| ^2\right\} \leqslant 4 \textrm{E}\left\{ V_{i j} V_{i j}^{\textrm{T}}\right\} =O(1)=o(n). \end{aligned}$$

By Lemma A.1 of Wang et al. (2009), it can be shown that $U_n=E\left\{ H_n\left( P_i, P_i\right) \right\} +o_n(1)$. Further combining the $\sqrt{n}$-consistency assumption of $\theta$ with the conditions (C.3.1)-(C.3.6), it is not difficult to derive the following result:

$$\begin{aligned} E\left\{ H_n\left( P_i, P_j\right) \right\} =E\left\{ E\left\{ H_n\left( P_i, P_j\right) \right\} \right\} =2 \tau \delta _n \Sigma \theta ^*\left( 1+o_p(1)\right) . \end{aligned}$$

Consequently, we can obtain that

$$\begin{aligned} U_n=\delta _n^{-1}\left( S_n^*\left( \theta ^*\right) -S_n^*(0)\right) =2 \tau \delta _n \Sigma \theta ^*+o_p(1). \end{aligned}$$

(A.1)

The quadratic function can be defined as

$$\begin{aligned} R_n^*\left( \theta ^*\right) \triangleq \delta _n^{-1} \theta ^{* \mathrm {~T}} S_n^*(0)+\frac{1}{2} \tau \delta _n \theta ^{* \mathrm {~T}}(2 \Sigma ) \theta ^*+\delta _n^{-1} \Phi _n^*(0). \end{aligned}$$

Then, for any $\epsilon >0$ and $c>0$, we can get

$$\begin{aligned} \textrm{P}\left\{ \sup _{\left\| \theta ^*\right\| \leqslant c}\left| \delta _n^{-1} \Phi _n^*\left( \theta ^*\right) -R_n^*\left( \theta ^*\right) \right| \geqslant \epsilon \right\} \rightarrow 0 \end{aligned}$$

(A.2)

In fact, in view of Eq. (A.1), we have

$$\begin{aligned} \begin{aligned} \nabla \left( \delta _n^{-1} \Phi _n^*\left( \theta ^*\right) -R_n^*\left( \theta ^*\right) \right)&=\delta _n^{-1} S_n^*\left( \theta ^*\right) -\delta _n^{-1} S_n^*(0)-2 \delta _n \tau \Sigma \theta ^* \\&=\delta _n^{-1}\left( S_n^*\left( \theta ^*\right) -S_n^*(0)\right) -2 \delta _n \tau \Sigma \theta ^*=o_p(1) \end{aligned} \end{aligned}$$

Therefore, following the same lines of Theorem A.3.7 in Hettmansperger and McKean (2011), we have (A.2) holds Next, we specify $\hat{\theta }^*$ and $\bar{\theta }^*$ as the minimizers of $\Phi _n^*\left( \theta ^*\right)$ and $R_n^*\left( \theta ^*\right)$, respectively. Under the conditions (C.3.1)-(C.3.6), following some similar analysis of Theorem 3.5.5 in Hettmansperger and McKean (2011), we can show that

$$\begin{aligned} \hat{\theta }^*=\bar{\theta }^*+o_p(1) . \end{aligned}$$

(A.3)

By the expression of $R_n^*\left( \theta ^*\right)$, it suffices to show that

$$\begin{aligned} \bar{\theta }^*=-\frac{1}{2} \delta _n^{-2} \tau ^{-1} \Sigma ^{-1} S_n^*(0) \end{aligned}$$

(A.4)

On the other hand, let $S_n(0) \triangleq -\delta _n^{-2} S_n^*(0)$. Then we have

$$\begin{aligned} \begin{aligned} S_n(0)&=\frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \leqslant j} {\text {sgn}}\left( \varepsilon _i+\Delta _i+\varepsilon _j+\Delta _j\right) V_{i j} \\&=\frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \leqslant j}\left\{ {\text {sgn}}\left( \varepsilon _i+\Delta _i+\varepsilon _j+\Delta _j\right) -{\text {sgn}}\left( \varepsilon _i+\varepsilon _j\right) +{\text {sgn}}\left( \varepsilon _i+\varepsilon _j\right) \right\} V_{i j} \\&\triangleq S_{n 1}(0)+S_{n 2}(0), \end{aligned} \end{aligned}$$

(A.5)

where

$$\begin{aligned} \begin{aligned} S_{n 1}(0)&=\frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \le j} {\text {sgn}}\left( \varepsilon _i+\varepsilon _j\right) V_{i j} \triangleq \frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \leqslant j} H_{n 1}\left( P_i, P_j\right) \\ S_{n 2}(0)&=\frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \leqslant j}\left[ {\text {sgn}}\left( \varepsilon _i+\Delta _i+\varepsilon _j+\Delta _j\right) -{\text {sgn}}\left( \varepsilon _i+\varepsilon _j\right) \right] V_{i j} \\&\triangleq \frac{\delta _n^{-1}}{2 n(n+1)} \sum _{i \leqslant j} H_{n 2}\left( P_i, P_j\right) . \end{aligned} \end{aligned}$$

Similarly, we can verify that $\textrm{E}\left\{ \left\| H_{n 1}\left( P_i, P_j\right) \right\| ^2\right\} =o(n)$ and $\textrm{E}\left\{ H_{n 1}\left( P_i, P_j\right) \right\} =0$. Therefore, we obtain from Lemma A.1 of Wang et al. (2009) that

$$\begin{aligned} S_{n 1}^*(0)=2 n^{-1 / 2} \sum _{i=1}^n \textrm{E}\left\{ H_{n 1}\left( P_i, P_j\right) \mid P_i\right\} +o_p(1) \triangleq 2 n^{-1 / 2} \sum _{i=1}^n T_n\left( P_i\right) +o_p(1) . \end{aligned}$$

Under the $\sqrt{n}$-consistency assumption of $\tilde{\beta }$, the conditions (C.3.1)-(C.3.6), and some calculations similar to those used in the proof of Theorem 3.2 in Wang et al. (2009), we have

$$\begin{aligned} E\left\{ T_n\left( P_i\right) T_n\left( P_i\right) ^{\textrm{T}}\right\} \rightarrow \frac{1}{4} E\left\{ \left[ g^{\prime }\left( X^{\textrm{T}} \beta _0\right) \right] ^2[2 H(\varepsilon )-1]^2 \tilde{X} \tilde{X}^{\textrm{T}}\right\} \end{aligned}$$

Then by Linderberg-Feller central limit theorem, we have

$$\begin{aligned} S_{n 1}(0) {\mathop {\rightarrow }\limits ^{d}} N\left( 0, E\left\{ \left[ g^{\prime }\left( X^{\textrm{T}} \beta _0\right) \right] ^2[2 H(\varepsilon )-1]^2 \tilde{X} \tilde{X}^{\textrm{T}}\right\} \right) . \end{aligned}$$

(A.6)

For the term $S_{n 2}^*(0)$, since

$$\begin{aligned} \begin{aligned} \Delta _i=&g\left( X_i^{\textrm{T}} \beta _0\right) -\hat{g}\left( X_i^{\textrm{T}} \tilde{\beta }\right) -\hat{g}^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) X_i^{\textrm{T}}\left( \beta _0-\tilde{\beta }\right) \\ =&g\left( X_i^{\textrm{T}} \tilde{\beta }\right) -g^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) X_i^{\textrm{T}}\left( \beta _0-\tilde{\beta }\right) +o_p\left( \left( X_i^{\textrm{T}} \beta _0-X_i^{\textrm{T}} \tilde{\beta }\right) ^2\right) \\&-\hat{g}\left( X_i^{\textrm{T}} \tilde{\beta }\right) -\hat{g}^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) X_i^{\textrm{T}}\left( \beta _0-\tilde{\beta }\right) \\ =&{\left[ g\left( X_i^{\textrm{T}} \tilde{\beta }\right) -\hat{g}\left( X_i^{\textrm{T}} \tilde{\beta }\right) \right] -\left[ g^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) -\hat{g}^{\prime }\left( X_i^{\textrm{T}} \tilde{\beta }\right) \right] X_i^{\textrm{T}}\left( \beta _0-\tilde{\beta }\right) } \\&+o_p\left( \left( X_i^{\textrm{T}} \beta _0-X_i^{\textrm{T}} \tilde{\beta }\right) ^2\right) . \end{aligned} \end{aligned}$$

For the term $S_{n 2}(0)$, from the results of Theorem 1, condition (C.3.5) and the bandwidth assumption, we have $\Delta _i=o_p(1)$. Similarly, $\Delta _j=o_p(1)$ also holds. Then, it is not difficult to show that

$$\begin{aligned} S_{n 2}(0)=o_p(1) . \end{aligned}$$

(A.7)

Consequently, from (A.3)-(A.7), we finish the proof.

Proof of Theorem 3.2 We have $\hat{\theta }-\theta _0=O_p\left( n^{-1 / 2}\right)$ from Theorem3.1.From the conditions that X are bounded support and we can derive the asymptotic normality of $\hat{g}(u,\beta )$ from the framework of varying coefficient model. Therefor, the proof is similar to the proof of Theorem 3 in Shang et al. (2012), we omit it here.

Proof of Theorem 4.1 We prove introduce the following lemma to shows that the penalized RLWA owns the sparsity property $\hat{\beta }_{I I}^\lambda =0$.

Lemma A.1 Under the conditions (C.3.1)-(C.3.6) given in the Section 3, if $\lambda \rightarrow 0$ and $\sqrt{n} \lambda \rightarrow \infty$ as $n \rightarrow \infty$, then, with probability tending to 1 , for any constant $C>0$, we have

$$\begin{aligned} Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \varvec{o}^{\textrm{T}}\right) ^{\textrm{T}}\right) =\min _{\left\| \beta _{I I}\right\| \leqslant C n^{-1 / 2}} Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right) \end{aligned}$$

(A.8)

Proof. Using the similar notation in the proof of Theorem 3.1, let $\delta _1=c_n^{-1}\left( \beta _I-\beta _{0 I}\right) , \delta _2=c_n^{-1}\left( \beta _{I I}-\beta _{0 I I}\right)$ and $\delta =\left( \delta _1^{\textrm{T}}, \delta _2^{\textrm{T}}\right) ^{\textrm{T}}$. From the expression of $Q_\lambda (\beta )$, it is easy to see that

$$\begin{aligned} \begin{aligned} Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) -Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right)&=L_n\left( \left( \beta _I^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) -L_n\left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right) -\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right) \\&=L_n^*\left( \left( \delta _1^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) -L_n^*\left( \left( \delta _1^{\textrm{T}}, \delta _2^{\textrm{T}}\right) ^{\textrm{T}}\right) -\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right) \\&=c_n R_n\left( \left( \delta _1^{\textrm{T}}, 0^{\textrm{T}}\right) ^{\textrm{T}}\right) -c_n R_n\left( \left( \delta _1^{\textrm{T}}, \delta _2^{\textrm{T}}\right) ^{\textrm{T}}\right) -\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right) \\&=\left\{ \left( \left( \delta _1^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) ^{\textrm{T}} S_n(0)+\tau c_n^2\left( \left( \delta _1^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) ^{\textrm{T}} \Sigma \left( \left( \delta _1^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) \right\} \\&-\left( \delta ^{\textrm{T}} S_n(0)+\tau c_n^2 \delta ^{\textrm{T}} \Sigma \delta \right) -\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right) \\ \triangleq&D_1(\delta )-D_2(\delta )-\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right) , \end{aligned} \end{aligned}$$

(A.9)

We derive the third equation from (A.2) and the fourth equality holds by defining $R_n\left( \beta ^*\right)$. Further more, by combing (A.6) and (A.7), we have $S_n^*(0)=O_p(1)$. This means $S_n(0)=O_p\left( c_n^2\right)$ because $S_n^*(0)=-c_n^{-2} S_n(0)$. Note that $\Vert \delta \Vert =O_p(1)$; it follows that $D_1(\delta )=O_p\left( c_n^2\right)$. Similarly, we can verify that $D_2(\delta )=O_p\left( c_n^2\right)$. For the last term of Eq. (A.9), since $p_{\lambda }(0)=0$, using the mean value theorem, we can get

$$\begin{aligned} \begin{aligned} \sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right)&=\sum _{k=d+1}^p p_\lambda ^{\prime }\left( \left| \xi _k\right| \right) \left| \xi _k\right| \geqslant c_n^2 \sqrt{\log (n)} \sqrt{n / \log (n)} \sqrt{n} \lambda \liminf _{n \rightarrow \infty } \liminf _{t \rightarrow 0^{+}} p_\lambda ^{\prime }\left( \xi _k\right) / \lambda \cdot \left| \xi _k\right| , \end{aligned} \end{aligned}$$

where $\left| \xi _k\right| \in \left( 0,\left| \beta _k\right| \right)$ for $k=d+1, d+2, \ldots , p$. considering the assumption $\sqrt{n} \lambda \rightarrow \infty$, with the fact that $\sqrt{\log (n)} \rightarrow \infty$ and $\sqrt{n / \log (n)} \rightarrow \infty$ as $n \rightarrow \infty$, it’s clearly to see that $\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right)$ is of a higher order compared with $O_p\left( c_n^2\right)$. That is, $Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) -Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right)$ is dominated by the negative term $-\sum _{k=d+1}^p p_\lambda \left( \left| \beta _k\right| \right)$ for large n. Above all, we complete the proof of Lemma A.1 and we continue to prove Theorem 4.1. Lemma A.1 shows that (i) of the theorem is obviously true. Then, using the same notation in the proof of Theorem 3.1, we focus on proving (ii). As the conditions set in Theorem 4.1, for any given $\beta _I$ satisfying $\left\| \beta _I-\beta _{0 I}\right\| =O_p\left( n^{-1 / 2}\right)$, from Lemma A.1, we have

$$\begin{aligned} Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}\right) =\min _{\left\| \beta _{I I}\right\| \leqslant C n^{-1 / 2}} Q_\lambda \left( \left( \beta _I^{\textrm{T}}, \beta _{I I}^{\textrm{T}}\right) ^{\textrm{T}}\right) \end{aligned}$$

Consider $Q_\lambda (\beta )=L_n(\beta )+\sum _{k=1}^p p_\lambda \left( \left| \beta _k\right| \right)$, and then the following canonical equation must be satisfied:

$$\begin{aligned} \left. \frac{\partial Q_\lambda (\beta )}{\partial \beta _k}\right| _{\left( \left( \hat{\beta }_I^\lambda \right) ^{\textrm{T}}, 0^{\textrm{T}}\right) ^{\textrm{T}}}=0 \quad \text{ for } \quad k=1, \ldots , d . \end{aligned}$$

(A.10)

As $L_n(\beta )$ can be written as $L_n^*\left( \beta ^*\right)$ with $\beta ^*=\sqrt{n}\left( \beta -\beta _0\right)$, then

$$\begin{aligned} \frac{\partial L_n(\beta )}{\partial \beta }=\frac{\partial L_n^*\left( \beta ^*\right) }{\partial \beta }=\frac{\partial L_n^*(\beta )}{\partial \beta ^*} \cdot \frac{\partial \beta ^*}{\partial \beta }=\sqrt{n} S_n\left( \beta ^*\right) \end{aligned}$$

In addition, through Eq. (A.1), we can easily find that $\sqrt{n} S_n\left( \beta ^*\right) =\sqrt{n} S_n(0)+2 \tau c_n \Sigma \beta ^*$. Let $e_k$ be a d-dimensional vector with the k-th component being equal to 1 and the other components being equal to 0, and $S_{n d}(0)$ be a vector including the first d components of $S_n(0)$. Through calculation, we can prove that Eq. (A.10) is equal to

$$\begin{aligned} \begin{aligned} 0=&\left. \frac{\partial L_n(\beta )}{\partial \beta _k}\right| _{\beta =\left( \left( \hat{\beta }_I^\lambda \right) ^{\textrm{T}} \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}}+p_\lambda ^{\prime }\left( \left| \hat{\beta }_k^\lambda \right| \right) \cdot {\text {sgn}}\left( \beta _k^\lambda \right) \\ =&\left. \varvec{e}_k^{\textrm{T}}\left( \sqrt{n} S_n(0)+2 \tau c_n \Sigma \beta ^*\right) \right| _{\beta =\left( \left( \hat{\beta }_I^\lambda \right) ^{\textrm{T}}, \textbf{0}^{\textrm{T}}\right) ^{\textrm{T}}}+p_\lambda ^{\prime }\left( \left| \hat{\beta }_k^\lambda \right| \right) \cdot {\text {sgn}}\left( \beta _k^\lambda \right) \\ =&\varvec{e}_k^{\textrm{T}} \sqrt{n} S_{n d}(0)+2 \tau c_n \Sigma _d \sqrt{n}\left( \hat{\beta }_I^\lambda -\beta _{0 I}\right) +p_\lambda ^{\prime }\left( \left| \beta _{0 k}\right| \right) \cdot {\text {sgn}}\left( \beta _{0 k}\right) \\&+\left( p_\lambda ^{\prime \prime }\left( \left| \beta _{0 k}\right| \right) +o_p(1)\right) \left( \hat{\beta }_k^\lambda -\beta _{0 k}\right) , \end{aligned} \end{aligned}$$

where the last equation is obtained by Taylor expansion, and $\Sigma _d$ is defined in Theorem 4.1. Note that $c_n=n^{-1 / 2}$; it follows that

$$\begin{aligned} c_n^{-2} \varvec{e}_k^{\textrm{T}} S_{n d}(0)+2 \tau \varvec{e}_k^{\textrm{T}} \Sigma _d \sqrt{n}\left( \hat{\beta }_I^\lambda -\beta _{0 I}\right) +\sqrt{n} p_\lambda ^{\prime }\left( \left| \beta _{0 k}\right| \right) \cdot {\text {sgn}}\left( \beta _{0 k}\right) +p_\lambda ^{\prime \prime }\left( \left| \beta _{0 k}\right| \right) \sqrt{n}\left( \hat{\beta }_k^\lambda -\beta _{0 k}\right) +o_p(1) . \end{aligned}$$

Eventually, by defining c and $\Lambda$, we can get the following equation

$$\begin{aligned} \left( \Sigma _d+\Lambda / 2\right) \cdot \sqrt{n}\left( \hat{\beta }_I^\lambda -\beta _{0 I}+\left( \Sigma _d+\Lambda / 2\right) ^{-1} c\right) =\frac{-1}{2 \tau } c_n^{-2} S_{n d}(0)=\frac{1}{2 \tau } S_{n d}^*(0) . \end{aligned}$$

Recall that $S_n^*(0) {\mathop {\rightarrow }\limits ^{d}} N\left( 0, \textrm{E}\left\{ g^{\prime }\left( X^{\textrm{T}} \beta _0\right) ^2[2 M(\varepsilon )-1]^2 \tilde{X} \tilde{X}^{\textrm{T}}\right\} \right)$ from the proof of Theorem 3.1. By the central limit theorem and Slutsky’s theorem, we complete the proof.

Proof of Theorem 4.3 We learn from theorem 4.1 that the penalized estimate $\hat{\beta }^{\lambda }$ keep similar properties with the oracle estimate and $\textrm{P}\left( \textrm{BIC}_{\lambda _n}=\textrm{BIC}_{S_T}\right) \rightarrow 1$. This means that BIC can correctly select the optimal tuning parameter $\lambda$ that can identify the real model asymptotically. Therefore, there is only one best $\lambda$ and its corresponding estimators produce real models.Next, we are going to prove $\textrm{P}\left( \inf _{\lambda \in O_{-} \cup O_{+}} \textrm{BIC}_\lambda >\textrm{BIC}_{\lambda _n}\right) \rightarrow 1$, where $O_{-}$and $O_{+}$stand for the underfitted case and overfitted case, respectively.

The underfitted case We consider that at least one covariate is missing in the real model. For any $\lambda \in O_{-}$satisfying $S_\lambda \nsupseteq S_T$, by virtue of the assumption (A2), we have $L_n^{S_\lambda } \geqslant L_n^{S_T}$. Recall that the definitions of $L_n^S$ and $\textrm{BIC}_\lambda$; we can verify that

$$\begin{aligned} \begin{aligned} \textrm{BIC}_\lambda&=\frac{12 \hat{\tau }^2}{n^2} \sum _{i \leqslant j}\left| Y_i- \rho W Y_i- \hat{g}\left( X_i^{\textrm{T}} \hat{\beta }^\lambda \right) +Y_j- \rho W Y_j- \hat{g}\left( X_j^{\textrm{T}} \hat{\beta }^\lambda \right) \right| +\frac{\log (n)}{n} d f_\lambda \\&\geqslant 12 \hat{\tau }^2 L_n^{S_\lambda }+\frac{\log (n)}{n} d f_\lambda \geqslant 12 \hat{\tau }^2 L_n^{S_T} . \end{aligned} \end{aligned}$$

Therefore, we have $\textrm{P}\left( \inf _{\lambda \in O_{-}} \textrm{BIC}_\lambda >\textrm{BIC}_{\lambda _n}\right) \rightarrow 1$ holds.

The overfitted case We consider the model contains all the covariates in the true model and at least one covariate that does not belong to the true model. Let

$$\begin{aligned} \bar{L}_n(\beta )=\frac{1}{n} \sum _{i \leqslant j}\left| Y_i-\rho W Y_i-\hat{g}\left( X_i^{\textrm{T}} \beta \right) +Y_j-\rho W Y_j-\hat{g}\left( X_j^{\textrm{T}} \beta \right) \right| \end{aligned}$$

for any $\lambda \in O_{+}$satisfying $S_\lambda \supset S_T$ but $S_\lambda \ne S_T$. Then, we have

$$\begin{aligned} n\left( \textrm{BIC}_\lambda -\textrm{BIC}_{\lambda _n}\right) =12 \hat{\tau }^2\left( \bar{L}_n\left( \hat{\beta }^\lambda \right) -\bar{L}_n\left( \hat{\beta }^{\lambda _n}\right) \right) +\left( d f_\lambda -d f_{\lambda _n}\right) \cdot \log (n) \end{aligned}$$

(A.11)

We can learn from assumption (A1) that both $\hat{\beta }^\lambda$ and $\hat{\beta }^{\lambda _n}$ are $\sqrt{n}$-consistent, i.e., $n\left\| \hat{\beta }^\lambda -\beta _0\right\| ^2=$ $O_p(1)$ and $n\left\| \hat{\beta }^{\lambda _n}-\beta _0\right\| ^2=O_p(1)$. Besides, we can verify that $\bar{L}_n\left( \hat{\beta }^\lambda \right) =O_p(1)$ and $\bar{L}_n\left( \hat{\beta }^{\lambda _n}\right) =O_p(1)$. Moveover, $\textrm{P}\left( d f_\lambda -d f_{\lambda _n} \geqslant 1\right) \rightarrow 1$ as $n \rightarrow \infty$; it follows that the right-hand side of (A.11) diverges to $\infty$ with probability tending to 1 . Hence, $\textrm{P}\left( \textrm{inf}_{\lambda \in O_{+}} \textrm{BIC} \textrm{C}_\lambda >\textrm{BIC}_{\lambda _n}\right) \rightarrow 1$. In general, we complete the proof by combining the two case.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Song, Y., Su, H. & Zhan, M. Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models. Netw Spat Econ (2024). https://doi.org/10.1007/s11067-024-09616-4

Download citation

Accepted: 15 January 2024
Published: 08 February 2024
DOI: https://doi.org/10.1007/s11067-024-09616-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models

Abstract

Access this article

Similar content being viewed by others

Variable selection for spatial autoregressive models with a diverging number of parameters

A semiparametric dynamic higher-order spatial autoregressive model

Bayesian analysis of partially linear, single-index, spatial autoregressive models

Data Availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models

Abstract

Access this article

Similar content being viewed by others

Variable selection for spatial autoregressive models with a diverging number of parameters

A semiparametric dynamic higher-order spatial autoregressive model

Bayesian analysis of partially linear, single-index, spatial autoregressive models

Data Availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation