Skip to main content
Log in

Huber Loss Meets Spatial Autoregressive Model: A Robust Variable Selection Method with Prior Information

  • Research
  • Published:
Networks and Spatial Economics Aims and scope Submit manuscript

Abstract

In recent times, the significance of variable selection has amplified because of the advent of high-dimensional data. The regularization method is a popular technique for variable selection and parameter estimation. However, spatial data is more intricate than ordinary data because of spatial correlation and non-stationarity. This article proposes a robust regularization regression estimator based on Huber loss and a generalized Lasso penalty to surmount these obstacles. Moreover, linear equality and inequality constraints are contemplated to boost the efficiency and accuracy of model estimation. To evaluate the suggested model’s performance, we formulate its Karush-Kuhn-Tucker (KKT) conditions, which are indicators used to assess the model’s characteristics and constraints, and establish a set of indicators, comprising the formula for the degrees of freedom. We employ these indicators to construct the AIC and BIC information criteria, which assist in choosing the optimal tuning parameters in numerical simulations. Using the classic Boston Housing dataset, we compare the suggested model’s performance with that of the model under squared loss in scenarios with and without anomalies. The outcomes demonstrate that the suggested model accomplishes robust variable selection. This investigation provides a novel approach for spatial data analysis with extensive applications in various fields, including economics, ecology, and medicine, and can facilitate the enhancement of the efficiency and accuracy of model estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author upon request.

References

  • Bascle G (2008) Controlling for endogeneity with instrumental variables in strategic management research. Strateg Organ 6(3):285–327

    Article  Google Scholar 

  • Demmel JW (1986) Matrix Computations (Gene H. Golub and Charles F. van Loan). SIAM Rev 28(2):252–255

    Article  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Statist 32(2):407–499

    Article  MathSciNet  Google Scholar 

  • Gilley OW, Pace RK (1996) On the Harrison and Rubinfeld data. J Enviro Econ Manage 31(3):403–405

    Article  Google Scholar 

  • Harrison D Jr, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manage 5(1):81–102

    Article  Google Scholar 

  • Hehn TM, Kooij JFP, Hamprecht FA (2020) End-to-end learning of decision trees and forests. Int J Comput Vis 128(4):997–1011

    Article  MathSciNet  Google Scholar 

  • Koenker R, Bassett Jr G (1978) Regression quantiles. Econometrica 46(1):33–50

  • Lambert-Lacroix S, Zwald L (2011) Robust regression through the Huber’s criterion and adaptive lasso penalty. Electron J Stat 5:1015–1053

    Article  MathSciNet  Google Scholar 

  • Liang H, Wu H, Zou G (2008) A note on conditional AIC for linear mixed-effects models. Biometrika 95(3):773–778

    Article  MathSciNet  PubMed  Google Scholar 

  • Liu J, Yuan L, Ye JP (2010) An efficient algorithm for a class of fused lasso problems. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 323–332

  • Liu Y, Zeng P, Lin L (2021) Degrees of freedom for regularized regression with Huber loss and linear constraints. Stat Pap 62(5):2383–2405

    Article  MathSciNet  Google Scholar 

  • Maity AK, Basu S, Ghosh S (2021) Bayesian criterion-based variable selection. J R Stat Soc Ser C Appl Stat 70(4):835–857

    Article  MathSciNet  Google Scholar 

  • Nowakowski S, Pokarowski P, Rejchel W, Sołtys A (2023) Improving group lasso for high-dimensional categorical data. In: Mikyška J, de Mulatier C, Paszynski M, Krzhizhanovskaya VV, Dongarra JJ, Sloot PM (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14074. Springer, Cham

  • Piribauer P, Crespo Cuaresma J (2016) Bayesian variable selection in spatial autoregressive models. Spat Econ Anal 11(4):457–479

    Article  Google Scholar 

  • Roth V (2004) The generalized LASSO. IEEE Trans Neural Netw 15(1):16–28

    Article  PubMed  Google Scholar 

  • Sakamoto Y, Ishiguro M, Kitagawa G (1986) Akaike information criterion statistics. Springer Dordrecht, 978-90-277-2253-9

  • Stoica P, Selen Y (2004) Model-order selection: a review of information criterion rules. IEEE Signal Process Mag 21(4):36–47

    Article  ADS  Google Scholar 

  • Su L, Yang Z (2011) Instrumental variable quantile estimation of spatial autoregressive models. Research Collection School Of Economics. Singapore Management University 1–35. https://ink.library.smu.edu.sg/soeresearch/1074

  • Tibshirani R, Taylor J (2011) The solution path of the generalized lasso. Ann Statist 39:1335–1371

    Article  MathSciNet  Google Scholar 

  • Tibshirani RJ, Taylor J (2012) Degrees of freedom in lasso problems. Ann Statist 40:1198–1232

    Article  MathSciNet  Google Scholar 

  • Vrieze SI (2012) Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychol Methods 17(2):228

    Article  PubMed  PubMed Central  Google Scholar 

  • Xie L, Wang X, Cheng W et al (2019) Variable selection for spatial autoregressive models. Commun Stat Theory Methods 50:1–16

    MathSciNet  Google Scholar 

  • Xie T, Cao R, Du J (2020) Variable selection for spatial autoregressive models with a diverging number of parameters. Stat Pap 61:1125–1145

    Article  MathSciNet  Google Scholar 

  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol 68(1):49–67

    Article  MathSciNet  Google Scholar 

  • Zhang Z, Lai Z, Xu Y et al (2017) Discriminative elastic-net regularized linear regression. IEEE Trans Image Process 26(3):1466–1481

    Article  MathSciNet  PubMed  ADS  Google Scholar 

Download references

Funding

The researches are supported by the National Key Research and Development Program of China (2021YFA1000102).

Author information

Authors and Affiliations

Authors

Contributions

Yunquan Song came up with the idea and developed the theory. Minmin Zhan and Yue Zhang conceived of the presented idea, Yue Zhang performed the computations, Minmin Zhan verified the analytical methods. Yunquan Song, Minmin Zhan and Yue Zhang contributed to the final version of the manuscript. Yongxin Liu supervised the project.

Corresponding author

Correspondence to Yunquan Song.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

The researches are supported by the Fundamental Research Funds for the Central Universities (No.23CX03012A), National Key Research and Development Program (2021YFA1000102) of China.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Theorem 1 Let \(P_{{\text {null}}}\) be the projection matrix of \({\text {null}}\left( G_{-A, B}\right)\), then \(P_{{\text {null}}} G_{-A, B}^T=0\). Multiplying both sides of Eq. (2.21) by \(P_{\text{ null }}\) yields:

$$\begin{aligned} \begin{aligned}&P_{\text{ null } } X_{-v}^*{ }^T X_{-v}^* \hat{\beta }^*=P_{\text{ null } } X_{-v}^*{ }^T y_{-v}-P_{\text{ null } } D_A^* \lambda s_A-P_{\text{ null } } G_{-A, B}^T \hat{\theta }_{-A, B}+P_{\text{ null } } X_v^* t s_v \\&=P_{\text{ null } } X_{-v}^*{ }^T y_{-v}-P_{\text{ null } } D_A^{* \top } \lambda s_A+P_{\text{ null } } X_v^{* \top } t s_\nu \\&\end{aligned} \end{aligned}$$
(A.1)

\(\hat{\beta }^*\) can be decomposed into the sum of two parts as follows:

$$\begin{aligned} \begin{aligned} \hat{\beta }^*&=P_{\text{ null } } \hat{\beta }^*+P_{\text{ col } \left( G_{-A, B}^T \right) } \hat{\beta }^* \\&=P_{\text{ null } } \hat{\beta }^*+G_{-A, B}^T\left( G_{-A, B} G_{-A, B}^T\right) ^{+} G_{-A, B} \hat{\beta }^* \\&=P_{\text{ null } } \hat{\beta }^*-A g_{-A, B} \end{aligned} \end{aligned}$$

The last equality holds because \(G_{-A, B} \hat{\beta }^*=g_{-A, B}\). Substituting the expression for \(\hat{\beta }^*\) into (A.1) and simplifying, we obtain:

$$\begin{aligned} \begin{array}{r} P_{\text{ null } } X_{-v}^*{ }^T X_{-v}^* P_{\text{ null } } \hat{\beta }^*=P_{\text{ null } } X_{-v}^*{ }^T y_{-v}-P_{\text{ null } } D_A^*{ }^T \lambda s_A+P_{\text{ null } } X_v^{* \top } t s_v+P_{\text{ null } } X_{-v}^*{ }^T X_{-v}^* A g_{-A, B} \\ \quad =P_{\text{ null } } X_{-v}^*{ }^T\left[ y_{-v}-\left( X_{-v}^* P_{\text{ null } } X_{-v}^*{ }^T\right) + X_{-v}^* P_{\text{ null } }\left( D_A^* \lambda s_A-X_v^* t s_v\right) +X_{-v}^* A g_{-A, B}\right] \end{array} \end{aligned}$$

The last equality holds because from (A.1), we can deduce that \(P_{\text{ null }} D_A^* \lambda s_A-P_{\text{ null }} X_v^{* T} t s_v \in {\text {col}}\left( P_{\text{ null }} X_{-v}^*{ }^{\top }\right)\). Further, we can derive:

$$\begin{aligned} P_{\text{ null } } D_A^{* T} \lambda s_A-P_{\text{ null } } X_v^{* T} t s_v=\left( P_{\text{ null } } X_{-v}^*{ }^T\right) \left( X_{-v}^* P_{\text{ null } } X_{-v}^*\right) +\left( X_{-v}^* P_{\text{ null } }\right) \left( P_{\text{ null } } D_A^{* T} \lambda s_A-P_{\text{ null } } X_v^{* T} t s_v\right) \end{aligned}$$

Therefore,

$$\begin{aligned} \begin{aligned}&X_{-v}^* P_{\text{ null } } \hat{\beta }^*=X_{-v}^* P_{\text{ null } }\left( P_{\text{ null } } X_{-v}^*{ }^T X_{-v}^* P_{\text{ null } }\right) + P_{\text{ null } } X_{-v}^*{ }^T \\&{\left[ y_{-v}-\left( X_{-v}^* P_{\text{ null } } X_{-v}^*{ }^T\right) + X_{-v}^* P_{\text{ null } }\left( D_{\mathcal {A}}^{* T} \lambda s_{\mathcal {A}}-X_v^{* T} t s_v\right) +X_{-v}^* A g_{-\mathcal {A}, \mathcal {B}}\right] } \\&=P_{X_{-v}^* P_{\text{ null } }}\left[ y_{-v}-\left( X_{-v}^* P_{\text{ null } } X_{-v}^*{ }^T\right) + X_{-v}^* P_{\text{ null } }\left( D_{\mathcal {A}}^{* T} \lambda s_{\mathcal {A}}-X_v^{* T} t s_v\right) +X_{-v}^* A g_{-\mathcal {A}, \mathcal {B}}\right] \end{aligned} \end{aligned}$$

Therefore, the fitted values \(X_{-v}^* \hat{\beta }^*\) can be expressed as:

$$\begin{aligned} \begin{aligned} X_{-v}^* \hat{\beta }^*&=X_{-v}^* P_{\text{ null } } \hat{\beta }^*-X_{-v}^* A g_{-\mathcal {A}, \mathcal {B}} \\&=P_{X_{-v}^* P_{\text{ null } }}\left[ y_{-v}-\left( X_{-v}^* P_{\text{ null } } X_{-v}^*{ }^T\right) + X_{-v}^* P_{\text{ null } }\left( D_{\mathcal {A}}^{* T} \lambda s_{\mathcal {A}}-X_v^{* T} t s_v\right) +X_{-v}^* A g_{-\mathcal {A}, \mathcal {B}}\right] -X_{-v}^* A g_{-\mathcal {A}, \mathcal {B}} \\&=P_{X_{-v}^* P_{\text{ null } }}\left[ y_{-v}-\left( X_{-v}^* P_{\text{ null } } X_{-v}^*{ }^T\right) + X_{-v}^* P_{\text{ null } }\left( D_{\mathcal {A}}^* \lambda s_{\mathcal {A}}-X_v^{* T} t s_v\right) \right] -\left( I-P_{X_{-v}^* P_{\text{ null } }}\right) X_{-v}^* A g_{-\mathcal {A}, \mathcal {B}} \end{aligned} \end{aligned}$$

Proof of Theorem 2 \(\hat{\mu }(y)=\hat{y}\) is continuous and almost everywhere differentiable with respect to y. Therefore, we can use Stein’s lemma to calculate the degrees of freedom for the Space Huber Lcg-Lasso fitted values. Thus,

$$\begin{aligned} \textrm{df}(\hat{\mu })=\mathbb {E}\left[ \sum _{i\in V}^n \frac{\partial \hat{y}_i}{\partial y_i}\right] =\mathbb {E}\left[ \sum _{i\in V} \frac{\partial \hat{y}_i}{\partial y_i}+\sum _{i \in V^C} \frac{\partial \hat{y}_i}{\partial y_i}\right] \end{aligned}$$

As \(\hat{\beta }^*\) depends only on \(y_{-v}\), the derivatives of the fitted values \(\hat{y}_i\) with respect to \(y_i\) are 0 for \(i \in \mathcal {V}\), i.e.,

$$\begin{aligned} \frac{\partial \hat{y}_i}{\partial y_i}=\frac{\partial x_i^T \hat{\beta }\left( y_{-v}\right) }{\partial y_i}=0 \text{, } \text{ for } i \in \mathcal {V} \end{aligned}$$

Thus, the expression for the degrees of freedom for the fitted values \(\hat{\mu }\) becomes

$$\begin{aligned} \textrm{df}(\hat{\mu })=\mathbb {E}\left[ \sum _{i \in V^C} \frac{\partial \hat{y}_i}{\partial y_i}\right] =E\left[ \left( \nabla \cdot \hat{y}_{-v}\right) \left( y_{-v}\right) \right] \end{aligned}$$

Considering the expression of \(\hat{y}_{-v}\) from Theorem 1, we have

$$\begin{aligned} \begin{aligned}&\left. \hat{y}_{-v}=P_{X_{-v}^* P_{\text{ null }}} y_{-v}-P_{X_{-v}^* P_{\text{ null }}}\left( X_{-v}^* P_{\text{ null } } X_{-v}^{* T}\right) + X_{-v}^* P_{\text{ null } }\left( D_{\mathcal {A}}^{* T} \lambda s_A-X_v^{* T} t s_v\right) \right] \\&-\left( I-P_{X_{-v}^* P_{\text{ null } }}\right) X_{-v}^* A g_{-A, B} \text{. } \end{aligned} \end{aligned}$$

The first term on the right-hand side depends directly on y, while the remaining parts depend only on the boundary sets \(\mathcal {A}\), \(\mathcal {B}\), \(\mathcal {V}\), and the signs \(s_{\mathcal {A}}\), \(s_{\mathcal {V}}\). \(\mathcal {A}\), \(\mathcal {B}\), \(\mathcal {V}\), \(s_A\), and \(s_V\) are locally constant in the neighborhood of y, implying their derivatives with respect to y are zero. Therefore,

$$\begin{aligned} \textrm{df}(\hat{\mu })=\mathbb {E}\left[ \left( \nabla \cdot \hat{y}_{-v}\right) \left( y_{-v}\right) \right] ={\text {tr}}\left( P_{X_{-v}^* P_{\text{ mul } }}\right) \end{aligned}$$

Since the trace of the projection matrix equals the dimension of the corresponding linear space, the theorem is established.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, Y., Zhan, M., Zhang, Y. et al. Huber Loss Meets Spatial Autoregressive Model: A Robust Variable Selection Method with Prior Information. Netw Spat Econ 24, 291–311 (2024). https://doi.org/10.1007/s11067-024-09614-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11067-024-09614-6

Keywords

Navigation