Abstract
Fraud detection is a critical task across various domains, requiring accurate identification of fraudulent activities within vast arrays of transactional data. The significant challenges in effectively detecting fraud stem from the inherent class imbalance between normal and fraudulent instances. To address this issue, we propose a novel approach that combines autoencoder-based noise factor encoding (NFE) with the synthetic minority oversampling technique (SMOTE). Our study evaluates the efficacy of this approach using three datasets with severe class imbalance. We compare three autoencoder variants—autoencoder (AE), variational autoencoder (VAE), and contractive autoencoder (CAE)—enhanced by the NFE technique. This technique involves training autoencoder models on real fraud data with an added noise factor during the encoding process, followed by combining this altered data with genuine fraud data. Subsequently, SMOTE is employed for oversampling. Through extensive experimentation, we assess various evaluation metrics. Our results demonstrate the superiority of the autoencoder-based NFE approach over the use of traditional oversampling methods like SMOTE alone. Specifically, the AE–NFE method outperforms other techniques in most cases, although the VAE–NFE and CAE–NFE methods also exhibit promising results in specific scenarios. This study highlights the effectiveness of leveraging autoencoder-based NFE and SMOTE for fraud detection. By addressing class imbalance and enhancing the performance of fraud detection models, our approach enables more accurate identification and prevention of fraudulent activities in real-world applications.
Similar content being viewed by others
Availability of data and materials
Datasets used in this study were obtained from Kaggle’s publicly available dataset repository and can be accessed through the following links: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud, https://www.kaggle.com/datasets/ealaxi/paysim1, https://www.kaggle.com/competitions/ieee-fraud-detection/data.
References
Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113
Fernando KRM, Tsokos CP (2021) Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks. IEEE Trans Neural Netw Learn Syst 33(7):2940–2951
Yi H, Jiang Q, Yan X, Wang B (2020) Imbalanced classification based on minority clustering synthetic minority oversampling technique with wind turbine fault detection application. IEEE Trans Ind Inf 17(9):5867–5875
Wang W, Huang Y, Wang Y, Wang L (2014) Generalized autoencoder: a neural network framework for dimensionality reduction. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 490–497
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC Genom 21:1–13
Habibpour M, Gharoun H, Mehdipour M, Tajally A, Asgharnezhad H, Shamsi A, Khosravi A, Nahavandi S (2023) Uncertainty-aware credit card fraud detection using deep learning. Eng Appl Artif Intell 123:106248
Wei Y-C, Lai Y-X, Wu M-E (2023) An evaluation of deep learning models for chargeback fraud detection in online games. Clust Comput 26(2):927–943
Strelcenia E, Prakoonwit S (2023) A survey on GAN techniques for data augmentation to address the imbalanced data issues in credit card fraud detection. Mach Learn Knowl Extract 5(1):304–329
Sinayobye O, Musabe R, Uwitonze A, Ngenzi A (2023) A credit card fraud detection model using machine learning methods with a hybrid of under sampling and oversampling for handling imbalanced datasets for high scores. In: Applied machine learning and data analytics: 5th international conference, AMLDA 2022, Reynosa, Tamaulipas, Mexico, December 22–23, 2022, Revised Selected Papers. Springer, pp 142–155
Wongvorachan T, He S, Bulut O (2023) A comparison of undersampling, oversampling, and smote methods for dealing with imbalanced classification in educational data mining. Information 14(1):54
Rathore SS, Chouhan SS, Jain DK, Vachhani AG (2022) Generative oversampling methods for handling imbalanced data in software fault prediction. IEEE Trans Reliab 71(2):747–762
Dablain D, Krawczyk B, Chawla NV (2022) Deepsmote: fusing deep learning and smote for imbalanced data. IEEE Trans Neural Netw Learn Syst
Zakariah M, AlQahtani SA, Al-Rakhami MS (2023) Machine learning-based adaptive synthetic sampling technique for intrusion detection. Appl Sci 13(11):6504
Strelcenia E, Prakoonwit S (2023) Improving classification performance in credit card fraud detection by using new data augmentation. Artif Intell 4(1):172–198
Laakom F, Raitoharju J, Iosifidis A, Gabbouj M (2022) Reducing redundancy in the bottleneck representation of the autoencoders. arXiv preprint arXiv:2202.04629
Takiddin A, Ismail M, Zafar U, Serpedin E (2022) Deep autoencoder-based anomaly detection of electricity theft cyberattacks in smart grids. IEEE Syst J 16(3):4106–4117
Fanai H, Abbasimehr H (2023) A novel combined approach based on deep autoencoder and deep classifiers for credit card fraud detection. Expert Syst Appl 217:119562
Zou J, Zhang J, Jiang P (2019) Credit card fraud detection using autoencoder neural network. arXiv preprint arXiv:1908.11553
Tingfei H, Guangquan C, Kuihua H (2020) Using variational auto encoding in credit card fraud detection. IEEE Access 8:149841–149853
Misra S, Thakur S, Ghosh M, Saha SK (2020) An autoencoder based model for detecting fraudulent credit card transaction. Proc Comput Sci 167:254–262
Lin T-H, Jiang J-R (2021) Credit card fraud detection with autoencoder and probabilistic random forest. Mathematics 9(21):2683
Zioviris G, Kolomvatsos K, Stamoulis G (2022) Credit card fraud detection using a deep learning multistage model. J Supercomput 78:1–26
Du H, Lv L, Guo A, Wang H (2023) Autoencoder and lightgbm for credit card fraud detection problems. Symmetry 15(4):870
Ding Y, Kang W, Feng J, Peng B, Yang A (2023) Credit card fraud detection based on improved variational autoencoder generative adversarial network. IEEE Access
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
Rifai S, Mesnil G, Vincent P, Muller X, Bengio Y, Dauphin Y, Glorot X (2011) Higher order contractive auto-encoder. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2011, Athens, Greece, September 5–9, 2011, Proceedings, Part II 22. Springer, pp 645–660
Alarfaj FK, Malik I, Khan HU, Almusallam N, Ramzan M, Ahmed M (2022) Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10:39700–39715
Esenogho E, Mienye ID, Swart TG, Aruleba K, Obaido G (2022) A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10:16400–16407
Funding
No external funding was received for this study.
Author information
Authors and Affiliations
Contributions
All authors contributed to the conception and design of the study. All authors reviewed and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Ethical approval
The authors confirm that this study was conducted in accordance with ethical principles and guidelines and that the appropriate ethics review was followed.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Çakır, M.Y., Şirin, Y. Enhanced autoencoder-based fraud detection: a novel approach with noise factor encoding and SMOTE. Knowl Inf Syst 66, 635–652 (2024). https://doi.org/10.1007/s10115-023-02016-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-02016-z