Skip to main content
Log in

Robust variable selection with exponential squared loss for the partially linear varying coefficient spatial autoregressive model

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

The partially linear varying coefficient spatial autoregressive model is a semi-parametric spatial autoregressive model in which the coefficients of some explanatory variables are variable, while the coefficients of the remaining explanatory variables are constant. For the nonparametric part, a local linear smoothing method is used to estimate the vector of coefficient functions in the model, and, to investigate its variable selection problem, this paper proposes a penalized robust regression estimation based on exponential squared loss, which can estimate the parameters while selecting important explanatory variables. A unique solution algorithm is composed using the block coordinate descent (BCD) algorithm and the concave-convex process (CCCP). Robustness of the proposed variable selection method is demonstrated by numerical simulations and illustrated by some housing data from Airbnb.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3

Similar content being viewed by others

References

  • Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202

    Article  MathSciNet  Google Scholar 

  • Cliff A, Ord J (1973) Spatial autocorrelation. Pion. Progress in Human Geography, London, pp 245–249

    Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Publ Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Forsythe GE, Moler CB, Malcolm MA (1977) Computer methods for mathematical computations. Prentice-hall, Hoboken

    Google Scholar 

  • Guo S, Wei CH (2015) Variable selection for spatial autoregressive models. J Minzu Univ China (Nat Sci Ed)

  • Kelejian HH (2008) A spatial j-test for model specification against a single or a set of non-nested alternatives. Lett Spat Resour Sci 1(1):3–11

    Article  Google Scholar 

  • Kelejian HH, Piras G (2011) An extension of Kelejian’s j-test for non-nested spatial models. Reg Sci Urban Econ 41(3):281–292

    Article  Google Scholar 

  • Kelejian HH, Piras G (2014) An extension of the j-test to a spatial panel data framework. J Appl Econom 31(2):387–402

    Article  MathSciNet  Google Scholar 

  • Li T, Yin Q, Peng J (2020) Variable selection of partially linear varying coefficient spatial autoregressive model. J Stat Comput Simul 90(15):2681–2704

    Article  MathSciNet  Google Scholar 

  • Liu X, Chen J, Cheng S (2018) A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat Stat 25:86–104

    Article  MathSciNet  Google Scholar 

  • Ma Y, Pan R, Zou T, Wang H (2020) A naive least squares method for spatial autoregression with covariates. Stat Sin 30(2):653–672

    MathSciNet  Google Scholar 

  • Mu J, Wang G, Wang L (2020) Spatial autoregressive partially linear varying coefficient models. J Nonparametric Stat 32(2):428–451

    Article  MathSciNet  Google Scholar 

  • Song Y, Liang X, Zhu Y, Lin L (2021) Robust variable selection with exponential squared loss for the spatial autoregressive model. Comput Stat Data Anal 155(1):107094

    Article  MathSciNet  Google Scholar 

  • Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157(1):18–33

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288

    MathSciNet  Google Scholar 

  • Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the lad-lasso. J Bus Econ Stat 25(3):347–355

    Article  MathSciNet  Google Scholar 

  • Wang X, Jiang Y, Huang M, Zhang H (2013) Robust variable selection with exponential squared loss. JASA: J Am Stat Assoc 108:632–643

    Article  MathSciNet  CAS  PubMed  Google Scholar 

  • Yuille AL, Rangarajan A (2003) The concave-convex procedure. Neural Comput 15(4):915–936

    Article  CAS  PubMed  Google Scholar 

  • Zhang X, Yu J (2018) Spatial weights matrix selection and model averaging for spatial autoregressive models. J Econom 203(1):1–18

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

JY and YS wrote the main manuscript text and JD prepared Tables 1–3. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yunquan Song.

Ethics declarations

Competing interest

The authors declare no competing interests.

Additional information

Handling Editor: Luiz Duczmal.

Their researches are supported by the Fundamental Research Funds for the Central Universities (No.23CX03012A), National Key Research and Development Program of China (2021YFA1000102).

Appendices

A Proof of Theorem 1

Let \(\xi =n^{-1/2}+a_n\). Similar to Fan and Li (2001), we first prove that for any given \(\varepsilon >0\), there exists a constant C such that

$$\begin{aligned} P\left\{ \sup _{\parallel u\parallel =C}\ell \left( \theta _0+\xi u\right) <\ell \left( \theta _0\right) \right\} \ge 1-\varepsilon \end{aligned}$$
(A.1)

where u is a (p+1)-dimensional vector such that \(\parallel u\parallel =C\). Then, we need to show that there exists a local maximizer \(\theta _n\) such that \(\left\| \hat{\theta }_n-\theta _0\right\| =0_p(\xi )\). We know that minimizing (10) is the same thing as maximizing

$$\begin{aligned} \ell _n\left( \theta \right) =\sum _{i=1}^{n}{\exp \left\{ -\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta \right) ^2/\gamma _n\right\} }-n\sum _{j=1}^{n}{p_{\lambda _j}\left( \left| \theta _j\right| \right) }{.} \end{aligned}$$
(A.2)

Let

$$\begin{aligned} D(\theta ,\gamma )=\sum _{i=1}^{n}\exp \left\{ -\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta \right) ^2/\gamma \right\} \frac{2\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta \right) }{\gamma }{\widetilde{G}}_i{.} \end{aligned}$$
(A.3)

Since \(p_{\lambda _j}(0)=0\) for \(j=1,\ldots ,p\) and \(\gamma _n-\gamma _0=o_p(1)\), by Taylor’s expansion we have

$$\begin{aligned} \begin{aligned}&\ell \left( \theta _0+\xi u\right) -\ell \left( \theta _0\right) \\&\quad =\sum _{i=1}^{n}exp\left\{ -\frac{\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T(\theta +\xi u)\right) ^2}{\gamma _n} \right\} -\sum _{i=1}^{n}exp\left\{ -\frac{\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta _0\right) ^2}{\gamma _n} \right\} -\sum _{j=1}^{p}\left\{ p_{\lambda _j}\left( \left| \theta _{j0}+\xi _{nu_j}\right| \right) - p_{\lambda _j}\left( \left| \theta _{j0}\right| \right) \right\} \\&\quad \le \sum _{i=1}^{n}exp\left\{ -\frac{\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T(\theta +\xi u)\right) ^2}{\gamma _n} \right\} -\sum _{i=1}^{n}exp\left\{ -\frac{\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta _0\right) ^2}{\gamma _n} \right\} -\sum _{j=1}^{s}\left\{ p_{\lambda _j}\left( \left| \theta _{j0}+\xi _{nu_j}\right| \right) - p_{\lambda _j}\left( \left| \theta _{j0}\right| \right) \right\} \\&\quad =\xi D\left( \theta _0,\gamma _n\right) ^T u \!-\! \frac{1}{2}u^T\left[ \!-\!I\left( \theta _0,\gamma _n\right) \right] un\xi ^2\left\{ 1+o_p(1)\right\} \!-\!\sum _{j=1}^{s}\left[ n\xi p_{\lambda _j}^\prime \left( \left| \theta _{0j}\right| \right) sign\left( \theta _{0j}\right) u_j+n\xi ^2p_{\lambda _j}^{\prime \prime }\left( \left| \theta _{0j}\right| \right) u_j^2\{1+o(1)\}\right] \\&\quad \le \xi D\left( \theta _0,\gamma _n\right) ^T u-\frac{1}{2}u^T\left[ -I\left( \theta _0,\gamma _n\right) \right] un\xi ^2\left\{ 1+o_p(1)\right\} -a_n n\xi _n\sum _{j=1}^{s} \left| u_j\right| -b_nn\xi ^2\sum _{j=1}^{s}u_j^2\{1+o(1)\}\\&\quad \le \xi D\left( \theta _0,\gamma _n\right) ^T u-\frac{1}{2}u^T\left[ -I\left( \theta _0,\gamma _n\right) \right] un\xi ^2\left\{ 1+o_p(1)\right\} -\sqrt{s}n\xi a_n\sum _{j=1}^{s} \left| u_j\right| -n\xi ^2b_n\left\| u\right\| ^2 \end{aligned} \end{aligned}$$
(A.4)

Note that \(n^{-1/2}D(\theta _0,\gamma _0) = O_p(1)\). Therefore, the order of the first term on the right side is equal to \(O_p\left( n^{1/2}\xi \right) =O_p \left( n\xi ^2\right)\) in (A.4). By choosing a sufficiently large C, the second term dominates the first term uniformly in \(\left\| u\right\| =C\). Since \(b_n = o_p(1)\), the third term is also dominated by the second term of (A.4). Therefore, (A.1) holds by choosing a sufficiently large C.

B Proof of Theorem 2

Proof of Theorem 2(1)

We now show the sparsity. By Theorem 1, it is sufficient to show that, for any \(\theta _1\) which satisfies \(\theta _1-\theta _{01}=O_p\left( n^{-1/2}\right)\) and some given small \(\varepsilon =Cn^{-1/2}\) and \(j=s+1,\ldots ,p\), we have \(\partial \ell /\partial \theta _j>0\), for \(0<\theta _j<\varepsilon _n\), and \(\partial \ell /\partial \theta _j<0\), for \(-\varepsilon _n<\theta _j<0\). Let

$$\begin{aligned} Q_n(\theta ,\gamma )=\sum _{i=1}^{n}exp\left\{ -{\left( {\widetilde{Y}}_i-{\widetilde{G}}_i^T\theta \right) ^2}/\gamma \right\} \end{aligned}$$
(B.1)

By Taylor’s expansion we have

$$\begin{aligned} \begin{aligned} \frac{\partial \ell (\theta )}{\partial \theta _{j}}&=\frac{\partial Q_{n}\left( \theta , \gamma _{n}\right) }{\partial \theta _{j}}-np_{\lambda _{j}}^{\prime }\left( \left| \theta _{j}\right| \right) {\text {sign}}\left( \theta _{j}\right) \\ =&\frac{\partial Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j}}+\sum _{l=1}^{p} \frac{\partial ^{2} Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j} \partial \theta _{l}}\left( \theta _{l}-\theta _{0 l}\right) \\&+\sum _{l=1}^{p} \sum _{k=1}^{p} \frac{\partial ^{3} Q_{n}\left( \theta ^{*}, \gamma _{n}\right) }{\partial \theta _{j} \partial \theta _{l} \partial \theta _{k}}\left( \theta _{l}-\theta _{0 l}\right) \left( \theta _{k}-\theta _{0 k}\right) -np_{\lambda _{j}}^{\prime }\left( \left| \theta _{j}\right| \right) {\text {sign}}\left( \theta _{j}\right) \end{aligned} \end{aligned}$$

where \(\theta ^{*}\) lies between \(\theta\) and \(\theta _{0}\). Note that

$$\begin{aligned} \begin{aligned} n^{-1} \frac{\partial Q_{n}\left( \theta _{0}, \gamma _{0}\right) }{\partial \theta _{j}}&=O_{p}(n^{-1/2}) \\ n^{-1} \frac{\partial ^{2} Q_{n}\left( \theta _{0}, \gamma _{0}\right) }{\partial \theta _{j} \partial \theta _{l}}&=E\left\{ \frac{\partial ^{2} Q_{n}\left( \theta _{0}\right) }{\partial \theta _{j} \partial \theta _{l}}\right\} +o_{p}(1)\\ n^{-1} \frac{\partial ^{3} Q_{n}\left( \theta ^{*}, \gamma _{n}\right) }{\partial \theta _{j} \partial \theta _{l} \partial \theta _{k}}&=O_{p}(1) \end{aligned} \end{aligned}$$

Since \(b_{n}=o_{p}(1)\) and \(\sqrt{n} a_{n}=o_{p}(1)\) , we obtain \(\theta -\theta _{0}=O_{p}\left( n^{-1 / 2}\right)\). By \(\sqrt{n}\left( \gamma _{n}-\gamma _{0}\right) =O_{p}(1)\) , we have

$$\begin{aligned} \frac{\partial \ell (\theta )}{\partial \theta _{j}}=n \lambda _{j}\left\{ -\lambda _{j}^{-1} p_{\lambda _{j}}^{\prime }\left( \left| \theta _{j}\right| \right) {\text {sign}}\left( \theta _{j}\right) +O_{p}\left( n^{-1 / 2} / \lambda _{j}\right) \right\} \end{aligned}$$
(B.2)

Since \(\frac{1}{\min _{s+1 \le j \le p+1} \sqrt{n} \lambda _{j}}=O_{p}(1)\) and \(\lim _{n \rightarrow \infty }inf\lim _{t \rightarrow 0^{+}}inf\left\{ \min _{s+1\le j\le d}p_{\lambda }^{\prime }(|t|)/\lambda _j>0\right\}\) with probability 1, the sign of the derivative is completely determined by that of \(\theta _j\). \(\square\)

Proof of Theorem 2(2)

We have shown that a \(\hat{\theta }_{n1}\) exists that is a \(\sqrt{n}\)-consistent local maximizer of \(\ell _n\left\{ (\theta _1,0)\right\}\) satisfying \(\partial \ell \{(\hat{\theta }_{n1},0)\}/ \partial \theta _j=0\), for \(j=1,\ldots ,s\).

Since \(\theta _{n1}\) is a consistent estimator, we have

$$\begin{aligned} \begin{aligned}&\frac{\partial Q_{n}\left\{ \left( \hat{\theta }_{n 1}, 0\right) , \gamma _{n}\right\} }{\partial \theta _{j}}-np_{\lambda _{j}}^{\prime }\left( \left| \theta _{j}\right| \right) {\text {sign}}\left( \theta _{j}\right) \\&\quad =\frac{\partial Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j}}+\sum _{l=1}^{s}\left\{ \frac{\partial ^{2} Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j} \partial \theta _{l}}+o_{p}(1)\right\} \left( \hat{\theta }_{l}-\theta _{0 l}\right) \\&\qquad -n\left[ p_{\lambda _{j}}^{\prime }\left( \left| \theta _{0 j}\right| \right) {\text {sign}}\left( \theta _{0 j}\right) +\left\{ p_{\lambda _{j}}^{\prime \prime }\left( \left| \theta _{0 j}\right| \right) +o_{p}(1)\right\} \left( \hat{\theta }_{j}-\theta _{0 j}\right) \right] =0{.} \\ \end{aligned} \end{aligned}$$
(B.3)

The above equation can be rewritten as follows

$$\begin{aligned} \frac{\partial Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j}}=\sum _{l=1}^{s}\left\{ E\left\{ -\frac{\partial ^{2} Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j} \partial \theta _{l}}\right\} +o_{p}(1)\right\} n\left( \hat{\theta }_{l}-\theta _{0 l}\right) +n\Delta +n\left( \Sigma _{1}+O_{p}(1)\right) \left( \hat{\theta }_{n 1}-\theta _{01}\right) \end{aligned}$$
(B.4)

and

$$\begin{aligned} \begin{aligned}&nI_{1}\left( \theta _{01}, \gamma _{0}\right) \left( \hat{\theta }_{n 1}-\theta _{01}\right) +n\Delta +n\left( \Sigma _{1}+O_{p}(1)\right) \left( \hat{\theta }_{n 1}-\theta _{01}\right) \\&\quad =n\left( I_{1}\left( \theta _{01}, \gamma _{0}\right) +\Sigma _{1}\right) \left( \hat{\theta }_{n 1}-\theta _{01}\right) +n\Delta \\&\quad =n\left( I_{1}\left( \theta _{01}, \gamma _{0}\right) +\Sigma _{1}\right) \left\{ \left( \hat{\theta }_{n 1}-\theta _{01}\right) +n\left( I_{1}\left( \theta _{01}, \gamma _{0}\right) +\Sigma _{1}\right) ^{-1} \Delta \right\} \\&\quad =\frac{\partial Q_{n}\left( \theta _{0}, \gamma _{n}\right) }{\partial \theta _{j}}+o_{p}(1){.} \end{aligned} \end{aligned}$$

Since \(\sqrt{n}\left( \gamma _{n}-\gamma _{0}\right) =o_{p}(1)\), invoking the Slutsky’s lemma and the Lindeberg-Feller central limit theorem, we have \(\Sigma _{1}={\text {diag}}\left\{ p_{\lambda _{1}}^{\prime \prime }\left( \left| \theta _{01}\right| \right) , \ldots , p_{\lambda _{s}}^{\prime \prime }\left( \left| \theta _{0 s}\right| \right) \right\}\), \(\Sigma _{2}={\text {cov}}\left( \exp \left( -r^{2} / \gamma _{0}\right) \frac{2r}{\gamma _{0}}\tilde{G}_{i 1}\right)\), \(\Delta =\left( p_{\lambda _{j}}^{\prime }\left( \left| \theta _{01}\right| \right) {\text {sign}}\left( \theta _{01}\right) , \ldots , p_{\lambda _{j}}^{\prime }\left( \left| \theta _{0 s}\right| \right) {\text {sign}}\left( \theta _{0 s}\right) \right) ^{T}\), \(I_{1}\left( \theta _{01}, \gamma _{0}\right) =\frac{2}{\gamma _{0}} E\left[ \exp \left( -r^{2} / \gamma _{0}\right) \left( \frac{2 r^{2}}{\gamma _{0}}-1\right) \right] \times \left( E \tilde{G}_{i 1} \tilde{G}_{i 1}^{T}\right)\). \(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, J., Song, Y. & Du, J. Robust variable selection with exponential squared loss for the partially linear varying coefficient spatial autoregressive model. Environ Ecol Stat 31, 97–127 (2024). https://doi.org/10.1007/s10651-024-00603-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-024-00603-z

Keywords

Navigation