Skip to main content
Log in

Robust instance-dependent cost-sensitive classification

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Instance-dependent cost-sensitive (IDCS) learning methods have proven useful for binary classification tasks where individual instances are associated with variable misclassification costs. However, we demonstrate in this paper by means of a series of experiments that IDCS methods are sensitive to noise and outliers in relation to instance-dependent misclassification costs and their performance strongly depends on the cost distribution of the data sample. Therefore, we propose a generic three-step framework to make IDCS methods more robust: (i) detect outliers automatically, (ii) correct outlying cost information in a data-driven way, and (iii) construct an IDCS learning method using the adjusted cost information. We apply this framework to cslogit, a logistic regression-based IDCS method, to obtain its robust version, which we name r-cslogit. The robustness of this approach is introduced in steps (i) and (ii), where we make use of robust estimators to detect and impute outlying costs of individual instances. The newly proposed r-cslogit method is tested on synthetic and semi-synthetic data and proven to be superior in terms of savings compared to its non-robust counterpart for variable levels of noise and outliers. All our code is made available online at https://github.com/SimonDeVos/Robust-IDCS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Bahnsen AC, Aouada D, Ottersten B (2014) Example-dependent costsensitive logistic regression for credit scoring. 2014 13th international conference on machine learning and applications, pp 263-269. https://doi.org/10.1109/ICMLA.2014.48

  • Bahnsen AC, Aouada D, Ottersten B (2015) Example-dependent costsensitive decision trees. Exp Sys Appl 42(19):6609–6619

    Article  Google Scholar 

  • Bahnsen AC, Aouada D, Stojanovic A, Ottersten B (2016) Feature engineering strategies for credit card fraud detection. Exp Sys Appl 51:134–142

    Article  Google Scholar 

  • Bergesio A, Yohai VJ (2011) Projection estimators for generalized linear models. J Am Stat Assoc 106(494):661–671

    Article  MathSciNet  MATH  Google Scholar 

  • Bianco AM, Yohai VJ (1996) Robust estimation in the logistic regression model. Robust statistics, data analysis, and computer intensive methods, Springer, Berlin, p 17–34

  • Bondell HD (2005) Minimum distance estimation for the logistic regression model. Biometrika 92(3):724–731

    Article  MathSciNet  MATH  Google Scholar 

  • Bondell HD (2008) A characteristic function approach to the biased sampling model, with application to robust logistic regression. J Stat Plann Infer 138(3):742–755

    Article  MathSciNet  MATH  Google Scholar 

  • Brefeld U, Geibel P, Wysotzki F (2003) Support vector machines with example dependent costs. European conference on machine learning, p 23–34

  • Cantoni E, Ronchetti E (2001) Robust inference for generalized linear models. J Am Stat Assoc 96(455):1022–1030

    Article  MathSciNet  MATH  Google Scholar 

  • Carroll RJ, Pederson S (1993) On robustness in the logistic regression model. J Royal Stat Soci: Ser B (Methodol) 55(3):693–706

    MathSciNet  MATH  Google Scholar 

  • Claude Sammut GIW (2017) Encyclopedia of machine learning and data mining. Springer, US

    Book  MATH  Google Scholar 

  • Croux C, Haesbroeck G (2003) Implementing the bianco and yohai estimator for logistic regression. Comput Stat & Data Anal 44(1–2):273–295

    Article  MathSciNet  MATH  Google Scholar 

  • Elkan C (2001) The foundations of cost-sensitive learning. Int Joint Conf Artif Intell 17:973–978

    Google Scholar 

  • Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost-sensitive boosting. Icml, Vol. 99, p 97–105

  • Ghosh A, Basu A (2016) Robust estimation in generalized linear models: the density power divergence approach. TEST 25(2):269–290

    Article  MathSciNet  MATH  Google Scholar 

  • Höppner S, Baesens B, Verbeke W, Verdonck T (2022) Instance-dependent cost-sensitive learning for detecting transfer fraud. Eur J Operat Res 297(1):291–300

    Article  MathSciNet  MATH  Google Scholar 

  • Hosseinian S, Morgenthaler S (2011) Robust binary regression. J Stat Plann Infer 141(4):1497–1509

    Article  MathSciNet  MATH  Google Scholar 

  • Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat, 35 (1), 73–101. Retrieved from http://www.jstor.org/stable/2238020

  • Huber PJ, Ronchetti E (2009) Robust statistics. Wiley, Hoboken, p 2

    Book  MATH  Google Scholar 

  • Künsch HR, Stefanski LA, Carroll RJ (1989) Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J Am Stat Assoc 84(406):460–466

    MathSciNet  MATH  Google Scholar 

  • Lessmann S, Haupt J, Coussement K, De Bock KW (2021) Targeting customers for profit: an ensemble learning framework to support marketing decision-making. Inf Sci 557:286–301

    Article  MathSciNet  Google Scholar 

  • Maronna RA, Martin RD, Yohai VJ, Salibián-Barrera M (2019) Robust statistics: theory and methods (with r). Wiley, Hobroken

    MATH  Google Scholar 

  • Monti GS, Filzmoser P (2021) Robust logistic zero-sum regression for microbiome compositional data. Adv Data Anal Classif 16(2):301–324

    Article  MathSciNet  MATH  Google Scholar 

  • Morgenthaler S (1992) Least-absolute-deviations fits for generalized linear models. Biometrika 79(4):747–754

    Article  MathSciNet  MATH  Google Scholar 

  • Petrides G, Moldovan D, Coenen L, Guns T, Verbeke W (2022) Costsensitive learning for profit-driven credit scoring. J Oper Res Soc 73(2):338–350

    Article  Google Scholar 

  • Petrides G, Verbeke W (2022) Cost-sensitive ensemble learning: a unifying framework. Data Min Knowl Discov 36(1):1–28

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ, Hubert M (2011) Robust statistics for outlier detection. Wiley Interdiscip: Rev Data Min Knowl Discov 1(1):73–79

    Google Scholar 

  • Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, Hobroken

    Book  MATH  Google Scholar 

  • Sahin Y, Bulkan S, Duman E (2013) A cost-sensitive decision tree approach for fraud detection. Exp Sys Appl 40(15):5916–5923

    Article  Google Scholar 

  • Štefelová N, Alfons A, Palarea-Albaladejo J, Filzmoser P, Hron K (2021) Robust regression with compositional covariates including cellwise outliers. Adv Data Anal Classif 15(4):869–909

    Article  MathSciNet  MATH  Google Scholar 

  • Thai-Nghe N, Gantner Z, Schmidt-Thieme L (2010) Cost-sensitive learning methods for imbalanced data. The 2010 international joint conference on neural networks (IJCNN) p 1–8.https://doi.org/10.1109/IJCNN.2010.5596486

  • ULB MLG (2018) Anonymized credit card transactions labeled as fraudulent or genuine. https://www.kaggle.com/mlg-ulb/creditcardfraud

  • Valdora M, Yohai VJ (2014) Robust estimators for generalized linear models. J Stat Plann Infer 146:31–48

    Article  MathSciNet  MATH  Google Scholar 

  • Vanderschueren T, Verdonck T, Baesens B, Verbeke W (2022) Predictthen- optimize or predict-and-optimize? an empirical evaluation of costsensitive learning strategies. Inf Sci 594:400–415

    Article  Google Scholar 

  • Verbeke W, Olaya D, Berrevoets J, Verboven S, Maldonado S (2020) The foundations of cost-sensitive causal classification. arXiv:2007.12582

  • Whitrow C, Hand DJ, Juszczak P, Weston D, Adams NM (2009) Transaction aggregation as a strategy for credit card fraud detection. Data Min Knowl Discov 18(1):30–55

    Article  MathSciNet  Google Scholar 

  • Zelenkov Y (2019) Example-dependent cost-sensitive adaptive boosting. Exp Sys Appl 135:71–82

    Article  Google Scholar 

Download references

Funding

No funds, grants, or other support was received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon De Vos.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Results on synthetic data

Appendix A Results on synthetic data

See Tables 5 and 6.

Table 5 This table displays the results of tests on synthetic data with no outlier and an outlier of size 100. We apply a \(2\times 5\)-fold cross validation procedure with a train/test split ratio of 0.8/0.2. We report the average together with the standard deviation over these 10 runs. Per row, the two classes become more and more imbalanced. The best performing methods are indicated in bold. In this table, r-cslogit always perform at least equally good in comparison to cslogit. In terms of the cost-sensitive metric Savings, logit is always outperformed by cslogit and r-cslogit. Logit performs best in terms of cost-insensitive metrics and its performance remains stable after increasing the size of the outlier given its cost-insensitive nature. An exception is Specificity with a 90/10 class imbalance and an outlier of 100. However, given the relatively high standard deviation of 0.11, these results are rather volatile because of the high class imbalance, which results in a small amount of observations in one class
Table 6 This table displays the results of tests on synthetic data with an outlier of size 1000 and an outlier of size 10000. We apply a \(2\times 5\)-fold cross validation procedure with a train/test split ratio of 0.8/0.2. We report the average together with the standard deviation over these 10 runs. Per row, the two classes become more and more imbalanced. The best performing methods are indicated in bold. In terms of Savings, r-cslogit always outperforms the other two methods and remains stable after increasing the size of the outlier. Also in terms of cost-insensitive metrics, the performance of r-cslogit remains stable. After increasing the outlier size, cslogit performs worse. This is analogous to the results as displayed in Fig. 5. Logit performs best in terms of cost-insensitive metrics and, given its cost-insensitive nature, its performance remains stable after increasing the size of the outlier. The few times that logit is outperformed by either cslogit or r-cslogit in terms of cost-insensitive metrics, the performance scores have a rather high volatility. This is predominantly the case for tests with high class imbalance

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Vos, S., Vanderschueren, T., Verdonck, T. et al. Robust instance-dependent cost-sensitive classification. Adv Data Anal Classif 17, 1057–1079 (2023). https://doi.org/10.1007/s11634-022-00533-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-022-00533-3

Keywords

Mathematics Subject Classification

Navigation