Robust variable selection with exponential squared loss for the partially linear varying coefficient spatial autoregressive model

Yu, Jialei; Song, Yunquan; Du, Jiang

doi:10.1007/s10651-024-00603-z

Robust variable selection with exponential squared loss for the partially linear varying coefficient spatial autoregressive model

Published: 13 February 2024

Volume 31, pages 97–127, (2024)
Cite this article

Environmental and Ecological Statistics Aims and scope Submit manuscript

Jialei Yu^1,2,
Yunquan Song¹ &
Jiang Du²

126 Accesses
Explore all metrics

Abstract

The partially linear varying coefficient spatial autoregressive model is a semi-parametric spatial autoregressive model in which the coefficients of some explanatory variables are variable, while the coefficients of the remaining explanatory variables are constant. For the nonparametric part, a local linear smoothing method is used to estimate the vector of coefficient functions in the model, and, to investigate its variable selection problem, this paper proposes a penalized robust regression estimation based on exponential squared loss, which can estimate the parameters while selecting important explanatory variables. A unique solution algorithm is composed using the block coordinate descent (BCD) algorithm and the concave-convex process (CCCP). Robustness of the proposed variable selection method is demonstrated by numerical simulations and illustrated by some housing data from Airbnb.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 2

Robust variable selection with exponential squared loss for partially linear spatial autoregressive models

Article 03 May 2023

Huber Loss Meets Spatial Autoregressive Model: A Robust Variable Selection Method with Prior Information

Article 27 January 2024

Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters

Article 27 May 2021

References

Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202
Article MathSciNet Google Scholar
Cliff A, Ord J (1973) Spatial autocorrelation. Pion. Progress in Human Geography, London, pp 245–249
Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Publ Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Forsythe GE, Moler CB, Malcolm MA (1977) Computer methods for mathematical computations. Prentice-hall, Hoboken
Google Scholar
Guo S, Wei CH (2015) Variable selection for spatial autoregressive models. J Minzu Univ China (Nat Sci Ed)
Kelejian HH (2008) A spatial j-test for model specification against a single or a set of non-nested alternatives. Lett Spat Resour Sci 1(1):3–11
Article Google Scholar
Kelejian HH, Piras G (2011) An extension of Kelejian’s j-test for non-nested spatial models. Reg Sci Urban Econ 41(3):281–292
Article Google Scholar
Kelejian HH, Piras G (2014) An extension of the j-test to a spatial panel data framework. J Appl Econom 31(2):387–402
Article MathSciNet Google Scholar
Li T, Yin Q, Peng J (2020) Variable selection of partially linear varying coefficient spatial autoregressive model. J Stat Comput Simul 90(15):2681–2704
Article MathSciNet Google Scholar
Liu X, Chen J, Cheng S (2018) A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat Stat 25:86–104
Article MathSciNet Google Scholar
Ma Y, Pan R, Zou T, Wang H (2020) A naive least squares method for spatial autoregression with covariates. Stat Sin 30(2):653–672
MathSciNet Google Scholar
Mu J, Wang G, Wang L (2020) Spatial autoregressive partially linear varying coefficient models. J Nonparametric Stat 32(2):428–451
Article MathSciNet Google Scholar
Song Y, Liang X, Zhu Y, Lin L (2021) Robust variable selection with exponential squared loss for the spatial autoregressive model. Comput Stat Data Anal 155(1):107094
Article MathSciNet Google Scholar
Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157(1):18–33
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288
MathSciNet Google Scholar
Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the lad-lasso. J Bus Econ Stat 25(3):347–355
Article MathSciNet Google Scholar
Wang X, Jiang Y, Huang M, Zhang H (2013) Robust variable selection with exponential squared loss. JASA: J Am Stat Assoc 108:632–643
Article MathSciNet CAS PubMed Google Scholar
Yuille AL, Rangarajan A (2003) The concave-convex procedure. Neural Comput 15(4):915–936
Article CAS PubMed Google Scholar
Zhang X, Yu J (2018) Spatial weights matrix selection and model averaging for spatial autoregressive models. J Econom 203(1):1–18
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

College of Science, China University of Petroleum, Qingdao, 266580, China
Jialei Yu & Yunquan Song
Faculty of Science, Beijing University of Technology, Beijing, 100124, China
Jialei Yu & Jiang Du

Authors

Jialei Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yunquan Song
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Du
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JY and YS wrote the main manuscript text and JD prepared Tables 1–3. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yunquan Song.

Ethics declarations

Competing interest

The authors declare no competing interests.

Additional information

Handling Editor: Luiz Duczmal.

Their researches are supported by the Fundamental Research Funds for the Central Universities (No.23CX03012A), National Key Research and Development Program of China (2021YFA1000102).

Appendices

A Proof of Theorem 1

Let $\xi =n^{-1/2}+a_n$. Similar to Fan and Li (2001), we first prove that for any given $\varepsilon >0$, there exists a constant C such that

$$\begin{aligned} P\left\{ \sup _{\parallel u\parallel =C}\ell \left( \theta _0+\xi u\right) <\ell \left( \theta _0\right) \right\} \ge 1-\varepsilon \end{aligned}$$

(A.1)

where u is a (p+1)-dimensional vector such that $\parallel u\parallel =C$. Then, we need to show that there exists a local maximizer $\theta _n$ such that $\left\| \hat{\theta }_n-\theta _0\right\| =0_p(\xi )$. We know that minimizing (10) is the same thing as maximizing

$$\begin{aligned} \ell _n\left( \theta \right) =\sum _{i=1}^{n}{\exp \left\{ -\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta \right) ^2/\gamma _n\right\} }-n\sum _{j=1}^{n}{p_{\lambda _j}\left( \left| \theta _j\right| \right) }{.} \end{aligned}$$

(A.2)

Let

$$\begin{aligned} D(\theta ,\gamma )=\sum _{i=1}^{n}\exp \left\{ -\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta \right) ^2/\gamma \right\} \frac{2\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta \right) }{\gamma }{\widetilde{G}}_i{.} \end{aligned}$$

(A.3)

Since $p_{\lambda _j}(0)=0$ for $j=1,\ldots ,p$ and $\gamma _n-\gamma _0=o_p(1)$, by Taylor’s expansion we have

$$\begin{aligned} \begin{aligned}&\ell \left( \theta _0+\xi u\right) -\ell \left( \theta _0\right) \\&\quad =\sum _{i=1}^{n}exp\left\{ -\frac{\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T(\theta +\xi u)\right) ^2}{\gamma _n} \right\} -\sum _{i=1}^{n}exp\left\{ -\frac{\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta _0\right) ^2}{\gamma _n} \right\} -\sum _{j=1}^{p}\left\{ p_{\lambda _j}\left( \left| \theta _{j0}+\xi _{nu_j}\right| \right) - p_{\lambda _j}\left( \left| \theta _{j0}\right| \right) \right\} \\&\quad \le \sum _{i=1}^{n}exp\left\{ -\frac{\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T(\theta +\xi u)\right) ^2}{\gamma _n} \right\} -\sum _{i=1}^{n}exp\left\{ -\frac{\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta _0\right) ^2}{\gamma _n} \right\} -\sum _{j=1}^{s}\left\{ p_{\lambda _j}\left( \left| \theta _{j0}+\xi _{nu_j}\right| \right) - p_{\lambda _j}\left( \left| \theta _{j0}\right| \right) \right\} \\&\quad =\xi D\left( \theta _0,\gamma _n\right) ^T u \!-\! \frac{1}{2}u^T\left[ \!-\!I\left( \theta _0,\gamma _n\right) \right] un\xi ^2\left\{ 1+o_p(1)\right\} \!-\!\sum _{j=1}^{s}\left[ n\xi p_{\lambda _j}^\prime \left( \left| \theta _{0j}\right| \right) sign\left( \theta _{0j}\right) u_j+n\xi ^2p_{\lambda _j}^{\prime \prime }\left( \left| \theta _{0j}\right| \right) u_j^2\{1+o(1)\}\right] \\&\quad \le \xi D\left( \theta _0,\gamma _n\right) ^T u-\frac{1}{2}u^T\left[ -I\left( \theta _0,\gamma _n\right) \right] un\xi ^2\left\{ 1+o_p(1)\right\} -a_n n\xi _n\sum _{j=1}^{s} \left| u_j\right| -b_nn\xi ^2\sum _{j=1}^{s}u_j^2\{1+o(1)\}\\&\quad \le \xi D\left( \theta _0,\gamma _n\right) ^T u-\frac{1}{2}u^T\left[ -I\left( \theta _0,\gamma _n\right) \right] un\xi ^2\left\{ 1+o_p(1)\right\} -\sqrt{s}n\xi a_n\sum _{j=1}^{s} \left| u_j\right| -n\xi ^2b_n\left\| u\right\| ^2 \end{aligned} \end{aligned}$$

(A.4)

Note that $n^{-1/2}D(\theta _0,\gamma _0) = O_p(1)$. Therefore, the order of the first term on the right side is equal to $O_p\left( n^{1/2}\xi \right) =O_p \left( n\xi ^2\right)$ in (A.4). By choosing a sufficiently large C, the second term dominates the first term uniformly in $\left\| u\right\| =C$. Since $b_n = o_p(1)$, the third term is also dominated by the second term of (A.4). Therefore, (A.1) holds by choosing a sufficiently large C.

B Proof of Theorem 2

Proof of Theorem 2(1)

We now show the sparsity. By Theorem 1, it is sufficient to show that, for any $\theta _1$ which satisfies $\theta _1-\theta _{01}=O_p\left( n^{-1/2}\right)$ and some given small $\varepsilon =Cn^{-1/2}$ and $j=s+1,\ldots ,p$, we have $\partial \ell /\partial \theta _j>0$, for $0<\theta _j<\varepsilon _n$, and $\partial \ell /\partial \theta _j<0$, for $-\varepsilon _n<\theta _j<0$. Let

$$\begin{aligned} Q_n(\theta ,\gamma )=\sum _{i=1}^{n}exp\left\{ -{\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta \right) ^2}/\gamma \right\} \end{aligned}$$

(B.1)

By Taylor’s expansion we have

$$\begin{aligned} \begin{aligned} \frac{\partial \ell (\theta )}{\partial \theta _{j}}&=\frac{\partial Q_{n}\left( \theta , \gamma _{n}\right) }{\partial \theta _{j}}-np_{\lambda _{j}}^{\prime }\left( \left| \theta _{j}\right| \right) {\text {sign}}\left( \theta _{j}\right) \\ =&\frac{\partial Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j}}+\sum _{l=1}^{p} \frac{\partial ^{2} Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j} \partial \theta _{l}}\left( \theta _{l}-\theta _{0 l}\right) \\&+\sum _{l=1}^{p} \sum _{k=1}^{p} \frac{\partial ^{3} Q_{n}\left( \theta ^{*}, \gamma _{n}\right) }{\partial \theta _{j} \partial \theta _{l} \partial \theta _{k}}\left( \theta _{l}-\theta _{0 l}\right) \left( \theta _{k}-\theta _{0 k}\right) -np_{\lambda _{j}}^{\prime }\left( \left| \theta _{j}\right| \right) {\text {sign}}\left( \theta _{j}\right) \end{aligned} \end{aligned}$$

where $\theta ^{*}$ lies between $\theta$ and $\theta _{0}$. Note that

$$\begin{aligned} \begin{aligned} n^{-1} \frac{\partial Q_{n}\left( \theta _{0}, \gamma _{0}\right) }{\partial \theta _{j}}&=O_{p}(n^{-1/2}) \\ n^{-1} \frac{\partial ^{2} Q_{n}\left( \theta _{0}, \gamma _{0}\right) }{\partial \theta _{j} \partial \theta _{l}}&=E\left\{ \frac{\partial ^{2} Q_{n}\left( \theta _{0}\right) }{\partial \theta _{j} \partial \theta _{l}}\right\} +o_{p}(1)\\ n^{-1} \frac{\partial ^{3} Q_{n}\left( \theta ^{*}, \gamma _{n}\right) }{\partial \theta _{j} \partial \theta _{l} \partial \theta _{k}}&=O_{p}(1) \end{aligned} \end{aligned}$$

Since $b_{n}=o_{p}(1)$ and $\sqrt{n} a_{n}=o_{p}(1)$ , we obtain $\theta -\theta _{0}=O_{p}\left( n^{-1 / 2}\right)$. By $\sqrt{n}\left( \gamma _{n}-\gamma _{0}\right) =O_{p}(1)$ , we have

$$\begin{aligned} \frac{\partial \ell (\theta )}{\partial \theta _{j}}=n \lambda _{j}\left\{ -\lambda _{j}^{-1} p_{\lambda _{j}}^{\prime }\left( \left| \theta _{j}\right| \right) {\text {sign}}\left( \theta _{j}\right) +O_{p}\left( n^{-1 / 2} / \lambda _{j}\right) \right\} \end{aligned}$$

(B.2)

Since $\frac{1}{\min _{s+1 \le j \le p+1} \sqrt{n} \lambda _{j}}=O_{p}(1)$ and $\lim _{n \rightarrow \infty }inf\lim _{t \rightarrow 0^{+}}inf\left\{ \min _{s+1\le j\le d}p_{\lambda }^{\prime }(|t|)/\lambda _j>0\right\}$ with probability 1, the sign of the derivative is completely determined by that of $\theta _j$. $\square$

Proof of Theorem 2(2)

We have shown that a $\hat{\theta }_{n1}$ exists that is a $\sqrt{n}$-consistent local maximizer of $\ell _n\left\{ (\theta _1,0)\right\}$ satisfying $\partial \ell \{(\hat{\theta }_{n1},0)\}/ \partial \theta _j=0$, for $j=1,\ldots ,s$.

Since $\theta _{n1}$ is a consistent estimator, we have

$$\begin{aligned} \begin{aligned}&\frac{\partial Q_{n}\left\{ \left( \hat{\theta }_{n 1}, 0\right) , \gamma _{n}\right\} }{\partial \theta _{j}}-np_{\lambda _{j}}^{\prime }\left( \left| \theta _{j}\right| \right) {\text {sign}}\left( \theta _{j}\right) \\&\quad =\frac{\partial Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j}}+\sum _{l=1}^{s}\left\{ \frac{\partial ^{2} Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j} \partial \theta _{l}}+o_{p}(1)\right\} \left( \hat{\theta }_{l}-\theta _{0 l}\right) \\&\qquad -n\left[ p_{\lambda _{j}}^{\prime }\left( \left| \theta _{0 j}\right| \right) {\text {sign}}\left( \theta _{0 j}\right) +\left\{ p_{\lambda _{j}}^{\prime \prime }\left( \left| \theta _{0 j}\right| \right) +o_{p}(1)\right\} \left( \hat{\theta }_{j}-\theta _{0 j}\right) \right] =0{.} \\ \end{aligned} \end{aligned}$$

(B.3)

The above equation can be rewritten as follows

$$\begin{aligned} \frac{\partial Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j}}=\sum _{l=1}^{s}\left\{ E\left\{ -\frac{\partial ^{2} Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j} \partial \theta _{l}}\right\} +o_{p}(1)\right\} n\left( \hat{\theta }_{l}-\theta _{0 l}\right) +n\Delta +n\left( \Sigma _{1}+O_{p}(1)\right) \left( \hat{\theta }_{n 1}-\theta _{01}\right) \end{aligned}$$

(B.4)

and

$$\begin{aligned} \begin{aligned}&nI_{1}\left( \theta _{01}, \gamma _{0}\right) \left( \hat{\theta }_{n 1}-\theta _{01}\right) +n\Delta +n\left( \Sigma _{1}+O_{p}(1)\right) \left( \hat{\theta }_{n 1}-\theta _{01}\right) \\&\quad =n\left( I_{1}\left( \theta _{01}, \gamma _{0}\right) +\Sigma _{1}\right) \left( \hat{\theta }_{n 1}-\theta _{01}\right) +n\Delta \\&\quad =n\left( I_{1}\left( \theta _{01}, \gamma _{0}\right) +\Sigma _{1}\right) \left\{ \left( \hat{\theta }_{n 1}-\theta _{01}\right) +n\left( I_{1}\left( \theta _{01}, \gamma _{0}\right) +\Sigma _{1}\right) ^{-1} \Delta \right\} \\&\quad =\frac{\partial Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j}}+o_{p}(1){.} \end{aligned} \end{aligned}$$

Since $\sqrt{n}\left( \gamma _{n}-\gamma _{0}\right) =o_{p}(1)$, invoking the Slutsky’s lemma and the Lindeberg-Feller central limit theorem, we have $\Sigma _{1}={\text {diag}}\left\{ p_{\lambda _{1}}^{\prime \prime }\left( \left| \theta _{01}\right| \right) , \ldots , p_{\lambda _{s}}^{\prime \prime }\left( \left| \theta _{0 s}\right| \right) \right\}$, $\Sigma _{2}={\text {cov}}\left( \exp \left( -r^{2} / \gamma _{0}\right) \frac{2r}{\gamma _{0}}\tilde{G}_{i 1}\right)$, $\Delta =\left( p_{\lambda _{j}}^{\prime }\left( \left| \theta _{01}\right| \right) {\text {sign}}\left( \theta _{01}\right) , \ldots , p_{\lambda _{j}}^{\prime }\left( \left| \theta _{0 s}\right| \right) {\text {sign}}\left( \theta _{0 s}\right) \right) ^{T}$, $I_{1}\left( \theta _{01}, \gamma _{0}\right) =\frac{2}{\gamma _{0}} E\left[ \exp \left( -r^{2} / \gamma _{0}\right) \left( \frac{2 r^{2}}{\gamma _{0}}-1\right) \right] \times \left( E \tilde{G}_{i 1} \tilde{G}_{i 1}^{T}\right)$. $\square$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, J., Song, Y. & Du, J. Robust variable selection with exponential squared loss for the partially linear varying coefficient spatial autoregressive model. Environ Ecol Stat 31, 97–127 (2024). https://doi.org/10.1007/s10651-024-00603-z

Download citation

Received: 26 June 2023
Accepted: 23 January 2024
Published: 13 February 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10651-024-00603-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust variable selection with exponential squared loss for the partially linear varying coefficient spatial autoregressive model

Abstract

Access this article

Similar content being viewed by others

Robust variable selection with exponential squared loss for partially linear spatial autoregressive models

Huber Loss Meets Spatial Autoregressive Model: A Robust Variable Selection Method with Prior Information

Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interest

Additional information

Appendices

A Proof of Theorem 1

B Proof of Theorem 2

Proof of Theorem 2(1)

Proof of Theorem 2(2)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust variable selection with exponential squared loss for the partially linear varying coefficient spatial autoregressive model

Abstract

Access this article

Similar content being viewed by others

Robust variable selection with exponential squared loss for partially linear spatial autoregressive models

Huber Loss Meets Spatial Autoregressive Model: A Robust Variable Selection Method with Prior Information

Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interest

Additional information

Appendices

A Proof of Theorem 1

B Proof of Theorem 2

Proof of Theorem 2(1)

Proof of Theorem 2(2)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation