Abstract
Classification is a common task in data mining that assigns a class label to an unseen situation. It has been widely used in decision making for various applications, and many machine learning algorithms have been developed to accomplish this task. Classification becomes critical when the problem under concern is related to serious situations such as fraud detection, cancer diseases, and quality control. Learning in these situations is characterized by predetermined asymmetric costs of incorrect class prediction, or critical consequences associated with erroneous class prediction. In this paper, a novel approach of cost-sensitive learning is proposed. The approach is constructed by employing the theory of logical analysis of data (LAD) to build accurate cost-sensitive classifiers. Two classifiers are proposed. The first classifier is established by solving a proposed pattern selection model, minimum misclassification cost model (MMCM), that aims at minimizing the asymmetric misclassification cost. The second classifier is established by solving another proposed pattern selection model, maximum precision–recall model (MPRM), that maximizes precision and recall willing to reach a 100% accuracy. A comparative study is conducted by using real datasets. The proposed MMCM has enabled LAD to realize up to 32.22% cost reduction from the misclassification cost realized by the traditional implementation of LAD. Moreover, MPRM has provided up to 19.15% increase in the precision and up to 37% increase in the recall. Also, MPRM has enhanced the performance of LAD while compared to common machine learning algorithms by providing better combinations of recall and false positive rate. This enabled LAD to provide the closet to the optimal point on the receiver operating characteristic (ROC) diagram when compared with existing machine learning methods. Incorporating the MMCM and the MPRM models into LAD establishes a novel implementation of LAD that makes LAD a promising cost-sensitive learning classifier compared to other machine learning classifiers.
Similar content being viewed by others
References
Abd-Elhamed A, Shaban Y, Mahmoud S (2018) Predicting dynamic response of structures under earthquake loads using logical analysis of data. Buildings 8(4):61. https://doi.org/10.3390/buildings8040061
Alexe G, Alexe S, Hammer PL (2008) A LAD-based method for selecting short oligo probes for genotyping applications. OR Spectrum 30:249–268. https://doi.org/10.1007/s00500-005-0505-9
Alexe G, Hammer PL (2006) Spanned patterns for the logical analysis of data. Discret Appl Math 154(7):1039–1049. https://doi.org/10.1016/j.dam.2005.03.031
Alexe S, Blackstone E, Hammer PL, Ishwaran H, Lauer MS, Pothier Snader CE (2003) Coronary risk prediction by logical analysis of data. Ann Oper Res 119(1–4):15–42. https://doi.org/10.1023/A:1022970120229
Berardi VL, Zhang GP (2007) The effect of misclassification costs on neural network classifiers. Decis Sci 30(3):659–682
Bonates TO (2007) Optimization in logical analysis of data. Doctoral dissertation. Rutgers, The State University of New Jersey. https://doi.org/10.7282/T32N52PZ
Bonates TO, Hammer PL, Kogan A (2008) Maximum patterns in datasets. Discret Appl Math 156(6):846–861. https://doi.org/10.1016/j.dam.2007.06.004
Boros E, Hammer PL, Ibaraki T, Kogan A, Mayoraz E, Muchnik I (2000) An implementation of logical analysis of data. IEEE Trans Knowl Data Eng 12(2):292–306. https://doi.org/10.1109/69.842268
Boros E, Horiyama T, Ibaraki T, Makino K, Yagiura M (2003) Finding essential attributes from binary data. Ann Math Artif Intell 39(3):223–257
Bruni R, Bianchi G (2015) Effective classification using a small training set based on discretization and statistical analysis. IEEE Trans Knowl Data Eng 27(9):2349–2361
Bruni R, Bianchi G, Dolente C, Leporelli C (2018) Logical analysis of data as a tool for the analysis of probabilistic discrete choice behavior. Comput Op Res. https://doi.org/10.1016/j.cor.2018.04.014
Caserta M, Reiners T (2016) A pool-based pattern generation algorithm for logical analysis of data with automatic fine-tuning. Zentralbl Chir 103(21):1445–1453. https://doi.org/10.1016/j.ejor.2015.05.078
Chou CA, Bonates TO, Lee C, Chaovalitwongse WA (2017) Multi-pattern generation framework for logical analysis of data. Ann Oper Res 249(1–2):329–349. https://doi.org/10.1007/s10479-015-1867-8
Crama Y, Hammer PL, Ibaraki T (1988) Cause-effect relationships and partially defined Boolean functions. Ann Oper Res 16(1):299–325. https://doi.org/10.1007/BF02283750
Larose DT, Larose CD (2015) Data mining and predictive analytics, 2nd edn. Wiley, London
Devi D, Biswas SK, Purkayastha B (2019) A cost-sensitive weighted random forest technique for credit card fraud detection. In: 10th international conference on computing, communication and networking technologies (ICCCNT). Pp 1–6
Elfar O, Yacout S, Osman H (2021) Accelerating logical analysis of data using an ensemble-based technique. Eng Lett 29:1616–1625
Guo C, Ryoo HS (2012) Compact MILP models for optimal and Pareto-optimal LAD patterns. Discret Appl Math 160(16–17):2339–2348. https://doi.org/10.1016/j.dam.2012.05.006
Hammer PL, Kogan A, Lejeune MA (2012) A logical analysis of banks’ financial strength ratings. Expert Syst Appl 39(9):7808–7821. https://doi.org/10.1016/j.eswa.2012.01.087
Hansen P, Meyer C (2011) A new column generation algorithm for logical analysis of data. Annals Op Res. https://doi.org/10.1007/s10479-011-0850-2
Khalifa RM, Yacout S, Bassetto S (2020) Developing machine-learning regression model with Logical Analysis of Data (LAD). Comput Ind Eng 151:106947. https://doi.org/10.1016/j.cie.2020.106947
Kim K, Ryoo HS (2008) A LAD-based method for selecting short oligo probes for genotyping applications. OR Spectrum 30(2):249–268. https://doi.org/10.1007/s00291-007-0089-0
Krȩtowski M, Grześ M (2007) Evolutionary induction of decision trees for misclassification cost minimization. In: Beliczynski B, Dzielinski A, Iwanowski M, Ribeiro B (eds) Adaptive and natural computing algorithms. ICANNGA 2007. Lecture Notes in Computer Science, vol 4431. Springer, Berlin, Heidelberg.
Kweon SJ, Hwang SW, Lee S, Jo MJ (2022) Demurrage pattern analysis using logical analysis of data: a case study of the Ulsan port authority. Expert Syst Appl 206:117745. https://doi.org/10.1016/j.eswa.2022.117745
Lejeune M, Lozin V, Lozina I, Ragab A, Yacout S (2018) Recent advances in the theory and practice of logical analysis of data. Eur J Op Res. https://doi.org/10.1016/j.ejor.2018.06.011
Lemaire P (2011) Extensions of Logical Analysis of Data for growth hormone deficiency diagnoses. Ann Oper Res 186(1):199–211. https://doi.org/10.1007/s10479-011-0901-8
Lomax S, Vadera S (2013) A survey of cost-sensitive decision tree induction algorithms. ACM Comput Surv 16:1–35
Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decision Support Syst 62:22–31
Mortada MA, Yacout S, Lakis A (2011) Diagnosis of rotor bearings using logical analysis of data. J Qual Maint Eng 17(4):371–397
Mortada MA, Yacout S, Lakis A (2014) Fault diagnosis in power transformers using multi-class logical analysis of data. J Intell Manuf 25(6):1429–1439. https://doi.org/10.1007/s10845-013-0750-1
Nanda S, Pendharkar P (2001) Linear models for minimizing misclassification costs in bankruptcy. Int J Intell Syst Acc Fin Mgmt 10(3):155–168
Osman H, Yacout S (2023) Condition-based monitoring of the rail wheel using logical analysis of data and ant colony optimization. J Qual Maint Eng 29(2):377–400. https://doi.org/10.1108/JQME-01-2022-0004
Ouyang R, Chou CA (2020) Integrated optimization model and algorithm for pattern generation and selection in logical analysis of data. Comput Op Res 124:105049. https://doi.org/10.1016/j.cor.2020.105049
Pendharkar P (2009) Misclassification cost minimizing fitness functions for genetic algorithm-based artificial neural network classifiers. J Oper Res Soc 60(8):1123–1134
Qin Z, Zhang C, Wang T, Zhang S (2010) Cost sensitive classification in data mining. In: Cao L., Feng Y., Zhong J. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science, vol 6440. Springer, Berlin, Heidelberg
Ragab A, Ouali M-S, Yacout S, Osman H (2016) Remaining useful life prediction using prognostic methodology based on logical analysis of data and Kaplan-Meier estimation. J Intell Manuf. https://doi.org/10.1007/s10845-014-0926-3
Ragab A, Yacout S, Ouali M-S, Osman H (2016) Prognostics of multiple failure modes in rotating machinery using a pattern-based classifier and cumulative incidence functions. J Intell Manuf. https://doi.org/10.1007/s10845-016-1244-8
Ragab A, Yacout S, Ouali MS, Osman H (2017) Pattern-based prognostic methodology for condition-based maintenance using selected and weighted survival curves. Qual Reliab Eng Int 33(8):1753–1772. https://doi.org/10.1002/qre.2142
Ryoo HS, Jang IY (2009) MILP approach to pattern generation in logical analysis of data. Discret Appl Math 157(4):749–761. https://doi.org/10.1016/j.dam.2008.07.005
Shazly K, Eid M, Salem H (2020) An efficient hybrid approach for twitter sentiment analysis based on bidirectional recurrent neural networks. International Journal of Computer Applications 175(17):32–36
Taha HA, Yacout S, Shaban Y (2023) Autonomous self-healing mechanism for a CNC milling machine based on pattern recognition. J Intell Manuf 34:2185–2205. https://doi.org/10.1007/s10845-022-01913-4
Volk O, Ratnovsky A, Naftali S, Singer G (2023) Classification of tracheal stenosis with asymmetric misclassification errors from EMG signals using an adaptive cost-sensitive learning method. Biomed Signal Process Control 85:104962. https://doi.org/10.1016/j.bspc.2023.104962
Yacout S (2010) Fault detection and diagnosis for condition based maintenance using the logical analysis of data. In: 40th international conference on computers and industrial engineering: soft computing techniques for advanced manufacturing and service systems, CIE40, 7, 1–6. https://doi.org/10.1109/ICCIE.2010.5668357
Zahirnia K, Teimouri M, Rahmani R, Salaq A (2015) Diagnosis of type 2 diabetes using cost-sensitive learning. In: 5th international conference on computer and knowledge engineering (ICCKE). Pp 158–163
Acknowledgements
The author would like to acknowledge the support provided by the Deanship of Scientific Research (DSR) at King Fahd University of Petroleum & Minerals (KFUPM) for funding this work through Project No. SR141010.
Author information
Authors and Affiliations
Contributions
I am, Hany Osman, the single author of this article. I did all the work; this includes research, analysis, writing, and modelling. This research was accomplished when I was an assistant professor at King Fahd University of Petroleum and Minerals.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Osman, H. Cost-sensitive learning using logical analysis of data. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02070-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10115-024-02070-1