Skip to main content
Log in

Cost-sensitive learning using logical analysis of data

  • Regular paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Classification is a common task in data mining that assigns a class label to an unseen situation. It has been widely used in decision making for various applications, and many machine learning algorithms have been developed to accomplish this task. Classification becomes critical when the problem under concern is related to serious situations such as fraud detection, cancer diseases, and quality control. Learning in these situations is characterized by predetermined asymmetric costs of incorrect class prediction, or critical consequences associated with erroneous class prediction. In this paper, a novel approach of cost-sensitive learning is proposed. The approach is constructed by employing the theory of logical analysis of data (LAD) to build accurate cost-sensitive classifiers. Two classifiers are proposed. The first classifier is established by solving a proposed pattern selection model, minimum misclassification cost model (MMCM), that aims at minimizing the asymmetric misclassification cost. The second classifier is established by solving another proposed pattern selection model, maximum precision–recall model (MPRM), that maximizes precision and recall willing to reach a 100% accuracy. A comparative study is conducted by using real datasets. The proposed MMCM has enabled LAD to realize up to 32.22% cost reduction from the misclassification cost realized by the traditional implementation of LAD. Moreover, MPRM has provided up to 19.15% increase in the precision and up to 37% increase in the recall. Also, MPRM has enhanced the performance of LAD while compared to common machine learning algorithms by providing better combinations of recall and false positive rate. This enabled LAD to provide the closet to the optimal point on the receiver operating characteristic (ROC) diagram when compared with existing machine learning methods. Incorporating the MMCM and the MPRM models into LAD establishes a novel implementation of LAD that makes LAD a promising cost-sensitive learning classifier compared to other machine learning classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Abd-Elhamed A, Shaban Y, Mahmoud S (2018) Predicting dynamic response of structures under earthquake loads using logical analysis of data. Buildings 8(4):61. https://doi.org/10.3390/buildings8040061

    Article  Google Scholar 

  2. Alexe G, Alexe S, Hammer PL (2008) A LAD-based method for selecting short oligo probes for genotyping applications. OR Spectrum 30:249–268. https://doi.org/10.1007/s00500-005-0505-9

    Article  MathSciNet  Google Scholar 

  3. Alexe G, Hammer PL (2006) Spanned patterns for the logical analysis of data. Discret Appl Math 154(7):1039–1049. https://doi.org/10.1016/j.dam.2005.03.031

    Article  MathSciNet  Google Scholar 

  4. Alexe S, Blackstone E, Hammer PL, Ishwaran H, Lauer MS, Pothier Snader CE (2003) Coronary risk prediction by logical analysis of data. Ann Oper Res 119(1–4):15–42. https://doi.org/10.1023/A:1022970120229

    Article  Google Scholar 

  5. Berardi VL, Zhang GP (2007) The effect of misclassification costs on neural network classifiers. Decis Sci 30(3):659–682

    Article  Google Scholar 

  6. Bonates TO (2007) Optimization in logical analysis of data. Doctoral dissertation. Rutgers, The State University of New Jersey. https://doi.org/10.7282/T32N52PZ

  7. Bonates TO, Hammer PL, Kogan A (2008) Maximum patterns in datasets. Discret Appl Math 156(6):846–861. https://doi.org/10.1016/j.dam.2007.06.004

    Article  MathSciNet  Google Scholar 

  8. Boros E, Hammer PL, Ibaraki T, Kogan A, Mayoraz E, Muchnik I (2000) An implementation of logical analysis of data. IEEE Trans Knowl Data Eng 12(2):292–306. https://doi.org/10.1109/69.842268

    Article  Google Scholar 

  9. Boros E, Horiyama T, Ibaraki T, Makino K, Yagiura M (2003) Finding essential attributes from binary data. Ann Math Artif Intell 39(3):223–257

    Article  MathSciNet  Google Scholar 

  10. Bruni R, Bianchi G (2015) Effective classification using a small training set based on discretization and statistical analysis. IEEE Trans Knowl Data Eng 27(9):2349–2361

    Article  Google Scholar 

  11. Bruni R, Bianchi G, Dolente C, Leporelli C (2018) Logical analysis of data as a tool for the analysis of probabilistic discrete choice behavior. Comput Op Res. https://doi.org/10.1016/j.cor.2018.04.014

    Article  Google Scholar 

  12. Caserta M, Reiners T (2016) A pool-based pattern generation algorithm for logical analysis of data with automatic fine-tuning. Zentralbl Chir 103(21):1445–1453. https://doi.org/10.1016/j.ejor.2015.05.078

    Article  Google Scholar 

  13. Chou CA, Bonates TO, Lee C, Chaovalitwongse WA (2017) Multi-pattern generation framework for logical analysis of data. Ann Oper Res 249(1–2):329–349. https://doi.org/10.1007/s10479-015-1867-8

    Article  MathSciNet  Google Scholar 

  14. Crama Y, Hammer PL, Ibaraki T (1988) Cause-effect relationships and partially defined Boolean functions. Ann Oper Res 16(1):299–325. https://doi.org/10.1007/BF02283750

    Article  MathSciNet  Google Scholar 

  15. Larose DT, Larose CD (2015) Data mining and predictive analytics, 2nd edn. Wiley, London

    Google Scholar 

  16. Devi D, Biswas SK, Purkayastha B (2019) A cost-sensitive weighted random forest technique for credit card fraud detection. In: 10th international conference on computing, communication and networking technologies (ICCCNT). Pp 1–6

  17. Elfar O, Yacout S, Osman H (2021) Accelerating logical analysis of data using an ensemble-based technique. Eng Lett 29:1616–1625

    Google Scholar 

  18. Guo C, Ryoo HS (2012) Compact MILP models for optimal and Pareto-optimal LAD patterns. Discret Appl Math 160(16–17):2339–2348. https://doi.org/10.1016/j.dam.2012.05.006

    Article  MathSciNet  Google Scholar 

  19. Hammer PL, Kogan A, Lejeune MA (2012) A logical analysis of banks’ financial strength ratings. Expert Syst Appl 39(9):7808–7821. https://doi.org/10.1016/j.eswa.2012.01.087

    Article  Google Scholar 

  20. Hansen P, Meyer C (2011) A new column generation algorithm for logical analysis of data. Annals Op Res. https://doi.org/10.1007/s10479-011-0850-2

    Article  Google Scholar 

  21. Khalifa RM, Yacout S, Bassetto S (2020) Developing machine-learning regression model with Logical Analysis of Data (LAD). Comput Ind Eng 151:106947. https://doi.org/10.1016/j.cie.2020.106947

    Article  Google Scholar 

  22. Kim K, Ryoo HS (2008) A LAD-based method for selecting short oligo probes for genotyping applications. OR Spectrum 30(2):249–268. https://doi.org/10.1007/s00291-007-0089-0

    Article  MathSciNet  Google Scholar 

  23. Krȩtowski M, Grześ M (2007) Evolutionary induction of decision trees for misclassification cost minimization. In: Beliczynski B, Dzielinski A, Iwanowski M, Ribeiro B (eds) Adaptive and natural computing algorithms. ICANNGA 2007. Lecture Notes in Computer Science, vol 4431. Springer, Berlin, Heidelberg.

  24. Kweon SJ, Hwang SW, Lee S, Jo MJ (2022) Demurrage pattern analysis using logical analysis of data: a case study of the Ulsan port authority. Expert Syst Appl 206:117745. https://doi.org/10.1016/j.eswa.2022.117745

    Article  Google Scholar 

  25. Lejeune M, Lozin V, Lozina I, Ragab A, Yacout S (2018) Recent advances in the theory and practice of logical analysis of data. Eur J Op Res. https://doi.org/10.1016/j.ejor.2018.06.011

    Article  Google Scholar 

  26. Lemaire P (2011) Extensions of Logical Analysis of Data for growth hormone deficiency diagnoses. Ann Oper Res 186(1):199–211. https://doi.org/10.1007/s10479-011-0901-8

    Article  MathSciNet  Google Scholar 

  27. Lomax S, Vadera S (2013) A survey of cost-sensitive decision tree induction algorithms. ACM Comput Surv 16:1–35

    Article  Google Scholar 

  28. Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decision Support Syst 62:22–31

    Article  Google Scholar 

  29. Mortada MA, Yacout S, Lakis A (2011) Diagnosis of rotor bearings using logical analysis of data. J Qual Maint Eng 17(4):371–397

    Article  Google Scholar 

  30. Mortada MA, Yacout S, Lakis A (2014) Fault diagnosis in power transformers using multi-class logical analysis of data. J Intell Manuf 25(6):1429–1439. https://doi.org/10.1007/s10845-013-0750-1

    Article  Google Scholar 

  31. Nanda S, Pendharkar P (2001) Linear models for minimizing misclassification costs in bankruptcy. Int J Intell Syst Acc Fin Mgmt 10(3):155–168

    Article  Google Scholar 

  32. Osman H, Yacout S (2023) Condition-based monitoring of the rail wheel using logical analysis of data and ant colony optimization. J Qual Maint Eng 29(2):377–400. https://doi.org/10.1108/JQME-01-2022-0004

    Article  Google Scholar 

  33. Ouyang R, Chou CA (2020) Integrated optimization model and algorithm for pattern generation and selection in logical analysis of data. Comput Op Res 124:105049. https://doi.org/10.1016/j.cor.2020.105049

    Article  MathSciNet  Google Scholar 

  34. Pendharkar P (2009) Misclassification cost minimizing fitness functions for genetic algorithm-based artificial neural network classifiers. J Oper Res Soc 60(8):1123–1134

    Article  Google Scholar 

  35. Qin Z, Zhang C, Wang T, Zhang S (2010) Cost sensitive classification in data mining. In: Cao L., Feng Y., Zhong J. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science, vol 6440. Springer, Berlin, Heidelberg

  36. Ragab A, Ouali M-S, Yacout S, Osman H (2016) Remaining useful life prediction using prognostic methodology based on logical analysis of data and Kaplan-Meier estimation. J Intell Manuf. https://doi.org/10.1007/s10845-014-0926-3

    Article  Google Scholar 

  37. Ragab A, Yacout S, Ouali M-S, Osman H (2016) Prognostics of multiple failure modes in rotating machinery using a pattern-based classifier and cumulative incidence functions. J Intell Manuf. https://doi.org/10.1007/s10845-016-1244-8

    Article  Google Scholar 

  38. Ragab A, Yacout S, Ouali MS, Osman H (2017) Pattern-based prognostic methodology for condition-based maintenance using selected and weighted survival curves. Qual Reliab Eng Int 33(8):1753–1772. https://doi.org/10.1002/qre.2142

    Article  Google Scholar 

  39. Ryoo HS, Jang IY (2009) MILP approach to pattern generation in logical analysis of data. Discret Appl Math 157(4):749–761. https://doi.org/10.1016/j.dam.2008.07.005

    Article  MathSciNet  Google Scholar 

  40. Shazly K, Eid M, Salem H (2020) An efficient hybrid approach for twitter sentiment analysis based on bidirectional recurrent neural networks. International Journal of Computer Applications 175(17):32–36

    Article  Google Scholar 

  41. Taha HA, Yacout S, Shaban Y (2023) Autonomous self-healing mechanism for a CNC milling machine based on pattern recognition. J Intell Manuf 34:2185–2205. https://doi.org/10.1007/s10845-022-01913-4

    Article  Google Scholar 

  42. Volk O, Ratnovsky A, Naftali S, Singer G (2023) Classification of tracheal stenosis with asymmetric misclassification errors from EMG signals using an adaptive cost-sensitive learning method. Biomed Signal Process Control 85:104962. https://doi.org/10.1016/j.bspc.2023.104962

    Article  Google Scholar 

  43. Yacout S (2010) Fault detection and diagnosis for condition based maintenance using the logical analysis of data. In: 40th international conference on computers and industrial engineering: soft computing techniques for advanced manufacturing and service systems, CIE40, 7, 1–6. https://doi.org/10.1109/ICCIE.2010.5668357

  44. Zahirnia K, Teimouri M, Rahmani R, Salaq A (2015) Diagnosis of type 2 diabetes using cost-sensitive learning. In: 5th international conference on computer and knowledge engineering (ICCKE). Pp 158–163

Download references

Acknowledgements

The author would like to acknowledge the support provided by the Deanship of Scientific Research (DSR) at King Fahd University of Petroleum & Minerals (KFUPM) for funding this work through Project No. SR141010.

Author information

Authors and Affiliations

Authors

Contributions

I am, Hany Osman, the single author of this article. I did all the work; this includes research, analysis, writing, and modelling. This research was accomplished when I was an assistant professor at King Fahd University of Petroleum and Minerals.

Corresponding author

Correspondence to Hany Osman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Osman, H. Cost-sensitive learning using logical analysis of data. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02070-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10115-024-02070-1

Keywords

Navigation