Enhanced autoencoder-based fraud detection: a novel approach with noise factor encoding and SMOTE

Çakır, Mert Yılmaz; Şirin, Yahya

doi:10.1007/s10115-023-02016-z

Enhanced autoencoder-based fraud detection: a novel approach with noise factor encoding and SMOTE

Regular Paper
Published: 27 November 2023

Volume 66, pages 635–652, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Mert Yılmaz Çakır¹^na1 &
Yahya Şirin¹^na1

199 Accesses
Explore all metrics

Abstract

Fraud detection is a critical task across various domains, requiring accurate identification of fraudulent activities within vast arrays of transactional data. The significant challenges in effectively detecting fraud stem from the inherent class imbalance between normal and fraudulent instances. To address this issue, we propose a novel approach that combines autoencoder-based noise factor encoding (NFE) with the synthetic minority oversampling technique (SMOTE). Our study evaluates the efficacy of this approach using three datasets with severe class imbalance. We compare three autoencoder variants—autoencoder (AE), variational autoencoder (VAE), and contractive autoencoder (CAE)—enhanced by the NFE technique. This technique involves training autoencoder models on real fraud data with an added noise factor during the encoding process, followed by combining this altered data with genuine fraud data. Subsequently, SMOTE is employed for oversampling. Through extensive experimentation, we assess various evaluation metrics. Our results demonstrate the superiority of the autoencoder-based NFE approach over the use of traditional oversampling methods like SMOTE alone. Specifically, the AE–NFE method outperforms other techniques in most cases, although the VAE–NFE and CAE–NFE methods also exhibit promising results in specific scenarios. This study highlights the effectiveness of leveraging autoencoder-based NFE and SMOTE for fraud detection. By addressing class imbalance and enhancing the performance of fraud detection models, our approach enables more accurate identification and prevention of fraudulent activities in real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-Driven Approach for Credit Card Fraud Detection with Autoencoder and One-Class Classification Techniques

Credit Card Fraud Detection: Addressing Imbalanced Datasets with a Multi-phase Approach

Article 09 January 2024

FraudJudger: Fraud Detection on Digital Payment Platforms with Fewer Labels

Availability of data and materials

Datasets used in this study were obtained from Kaggle’s publicly available dataset repository and can be accessed through the following links: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud, https://www.kaggle.com/datasets/ealaxi/paysim1, https://www.kaggle.com/competitions/ieee-fraud-detection/data.

Notes

References

Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113
Article Google Scholar
Fernando KRM, Tsokos CP (2021) Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks. IEEE Trans Neural Netw Learn Syst 33(7):2940–2951
Article Google Scholar
Yi H, Jiang Q, Yan X, Wang B (2020) Imbalanced classification based on minority clustering synthetic minority oversampling technique with wind turbine fault detection application. IEEE Trans Ind Inf 17(9):5867–5875
Article Google Scholar
Wang W, Huang Y, Wang Y, Wang L (2014) Generalized autoencoder: a neural network framework for dimensionality reduction. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 490–497
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC Genom 21:1–13
Article Google Scholar
Habibpour M, Gharoun H, Mehdipour M, Tajally A, Asgharnezhad H, Shamsi A, Khosravi A, Nahavandi S (2023) Uncertainty-aware credit card fraud detection using deep learning. Eng Appl Artif Intell 123:106248
Article Google Scholar
Wei Y-C, Lai Y-X, Wu M-E (2023) An evaluation of deep learning models for chargeback fraud detection in online games. Clust Comput 26(2):927–943
Article Google Scholar
Strelcenia E, Prakoonwit S (2023) A survey on GAN techniques for data augmentation to address the imbalanced data issues in credit card fraud detection. Mach Learn Knowl Extract 5(1):304–329
Article Google Scholar
Sinayobye O, Musabe R, Uwitonze A, Ngenzi A (2023) A credit card fraud detection model using machine learning methods with a hybrid of under sampling and oversampling for handling imbalanced datasets for high scores. In: Applied machine learning and data analytics: 5th international conference, AMLDA 2022, Reynosa, Tamaulipas, Mexico, December 22–23, 2022, Revised Selected Papers. Springer, pp 142–155
Wongvorachan T, He S, Bulut O (2023) A comparison of undersampling, oversampling, and smote methods for dealing with imbalanced classification in educational data mining. Information 14(1):54
Article Google Scholar
Rathore SS, Chouhan SS, Jain DK, Vachhani AG (2022) Generative oversampling methods for handling imbalanced data in software fault prediction. IEEE Trans Reliab 71(2):747–762
Article Google Scholar
Dablain D, Krawczyk B, Chawla NV (2022) Deepsmote: fusing deep learning and smote for imbalanced data. IEEE Trans Neural Netw Learn Syst
Zakariah M, AlQahtani SA, Al-Rakhami MS (2023) Machine learning-based adaptive synthetic sampling technique for intrusion detection. Appl Sci 13(11):6504
Article Google Scholar
Strelcenia E, Prakoonwit S (2023) Improving classification performance in credit card fraud detection by using new data augmentation. Artif Intell 4(1):172–198
Google Scholar
Laakom F, Raitoharju J, Iosifidis A, Gabbouj M (2022) Reducing redundancy in the bottleneck representation of the autoencoders. arXiv preprint arXiv:2202.04629
Takiddin A, Ismail M, Zafar U, Serpedin E (2022) Deep autoencoder-based anomaly detection of electricity theft cyberattacks in smart grids. IEEE Syst J 16(3):4106–4117
Article Google Scholar
Fanai H, Abbasimehr H (2023) A novel combined approach based on deep autoencoder and deep classifiers for credit card fraud detection. Expert Syst Appl 217:119562
Article Google Scholar
Zou J, Zhang J, Jiang P (2019) Credit card fraud detection using autoencoder neural network. arXiv preprint arXiv:1908.11553
Tingfei H, Guangquan C, Kuihua H (2020) Using variational auto encoding in credit card fraud detection. IEEE Access 8:149841–149853
Article Google Scholar
Misra S, Thakur S, Ghosh M, Saha SK (2020) An autoencoder based model for detecting fraudulent credit card transaction. Proc Comput Sci 167:254–262
Article Google Scholar
Lin T-H, Jiang J-R (2021) Credit card fraud detection with autoencoder and probabilistic random forest. Mathematics 9(21):2683
Article Google Scholar
Zioviris G, Kolomvatsos K, Stamoulis G (2022) Credit card fraud detection using a deep learning multistage model. J Supercomput 78:1–26
Article Google Scholar
Du H, Lv L, Guo A, Wang H (2023) Autoencoder and lightgbm for credit card fraud detection problems. Symmetry 15(4):870
Article Google Scholar
Ding Y, Kang W, Feng J, Peng B, Yang A (2023) Credit card fraud detection based on improved variational autoencoder generative adversarial network. IEEE Access
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet Google Scholar
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
Rifai S, Mesnil G, Vincent P, Muller X, Bengio Y, Dauphin Y, Glorot X (2011) Higher order contractive auto-encoder. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2011, Athens, Greece, September 5–9, 2011, Proceedings, Part II 22. Springer, pp 645–660
Alarfaj FK, Malik I, Khan HU, Almusallam N, Ramzan M, Ahmed M (2022) Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10:39700–39715
Article Google Scholar
Esenogho E, Mienye ID, Swart TG, Aruleba K, Obaido G (2022) A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10:16400–16407
Article Google Scholar

Download references

Funding

No external funding was received for this study.

Author information

Mert Yılmaz Çakır and Yahya Şirin have contributed equally to this work.

Authors and Affiliations

Computer Sciences and Engineering, Istanbul Sabahattin Zaim University, 34303, Halkalı, Istanbul, Turkey
Mert Yılmaz Çakır & Yahya Şirin

Authors

Mert Yılmaz Çakır
View author publications
You can also search for this author in PubMed Google Scholar
Yahya Şirin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the conception and design of the study. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Mert Yılmaz Çakır.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Ethical approval

The authors confirm that this study was conducted in accordance with ethical principles and guidelines and that the appropriate ethics review was followed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Çakır, M.Y., Şirin, Y. Enhanced autoencoder-based fraud detection: a novel approach with noise factor encoding and SMOTE. Knowl Inf Syst 66, 635–652 (2024). https://doi.org/10.1007/s10115-023-02016-z

Download citation

Received: 05 July 2023
Revised: 24 September 2023
Accepted: 27 October 2023
Published: 27 November 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10115-023-02016-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced autoencoder-based fraud detection: a novel approach with noise factor encoding and SMOTE

Abstract

Access this article

Similar content being viewed by others

Data-Driven Approach for Credit Card Fraud Detection with Autoencoder and One-Class Classification Techniques

Credit Card Fraud Detection: Addressing Imbalanced Datasets with a Multi-phase Approach

FraudJudger: Fraud Detection on Digital Payment Platforms with Fewer Labels

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhanced autoencoder-based fraud detection: a novel approach with noise factor encoding and SMOTE

Abstract

Access this article

Similar content being viewed by others

Data-Driven Approach for Credit Card Fraud Detection with Autoencoder and One-Class Classification Techniques

Credit Card Fraud Detection: Addressing Imbalanced Datasets with a Multi-phase Approach

FraudJudger: Fraud Detection on Digital Payment Platforms with Fewer Labels

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation