Skip to main content
Log in

Enhanced autoencoder-based fraud detection: a novel approach with noise factor encoding and SMOTE

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Fraud detection is a critical task across various domains, requiring accurate identification of fraudulent activities within vast arrays of transactional data. The significant challenges in effectively detecting fraud stem from the inherent class imbalance between normal and fraudulent instances. To address this issue, we propose a novel approach that combines autoencoder-based noise factor encoding (NFE) with the synthetic minority oversampling technique (SMOTE). Our study evaluates the efficacy of this approach using three datasets with severe class imbalance. We compare three autoencoder variants—autoencoder (AE), variational autoencoder (VAE), and contractive autoencoder (CAE)—enhanced by the NFE technique. This technique involves training autoencoder models on real fraud data with an added noise factor during the encoding process, followed by combining this altered data with genuine fraud data. Subsequently, SMOTE is employed for oversampling. Through extensive experimentation, we assess various evaluation metrics. Our results demonstrate the superiority of the autoencoder-based NFE approach over the use of traditional oversampling methods like SMOTE alone. Specifically, the AE–NFE method outperforms other techniques in most cases, although the VAE–NFE and CAE–NFE methods also exhibit promising results in specific scenarios. This study highlights the effectiveness of leveraging autoencoder-based NFE and SMOTE for fraud detection. By addressing class imbalance and enhancing the performance of fraud detection models, our approach enables more accurate identification and prevention of fraudulent activities in real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

Datasets used in this study were obtained from Kaggle’s publicly available dataset repository and can be accessed through the following links: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud, https://www.kaggle.com/datasets/ealaxi/paysim1, https://www.kaggle.com/competitions/ieee-fraud-detection/data.

Notes

  1. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud.

  2. https://www.kaggle.com/datasets/ealaxi/paysim1.

  3. https://www.kaggle.com/competitions/ieee-fraud-detection/data.

References

  1. Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113

    Article  Google Scholar 

  2. Fernando KRM, Tsokos CP (2021) Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks. IEEE Trans Neural Netw Learn Syst 33(7):2940–2951

    Article  Google Scholar 

  3. Yi H, Jiang Q, Yan X, Wang B (2020) Imbalanced classification based on minority clustering synthetic minority oversampling technique with wind turbine fault detection application. IEEE Trans Ind Inf 17(9):5867–5875

    Article  Google Scholar 

  4. Wang W, Huang Y, Wang Y, Wang L (2014) Generalized autoencoder: a neural network framework for dimensionality reduction. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 490–497

  5. Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC Genom 21:1–13

    Article  Google Scholar 

  6. Habibpour M, Gharoun H, Mehdipour M, Tajally A, Asgharnezhad H, Shamsi A, Khosravi A, Nahavandi S (2023) Uncertainty-aware credit card fraud detection using deep learning. Eng Appl Artif Intell 123:106248

    Article  Google Scholar 

  7. Wei Y-C, Lai Y-X, Wu M-E (2023) An evaluation of deep learning models for chargeback fraud detection in online games. Clust Comput 26(2):927–943

    Article  Google Scholar 

  8. Strelcenia E, Prakoonwit S (2023) A survey on GAN techniques for data augmentation to address the imbalanced data issues in credit card fraud detection. Mach Learn Knowl Extract 5(1):304–329

    Article  Google Scholar 

  9. Sinayobye O, Musabe R, Uwitonze A, Ngenzi A (2023) A credit card fraud detection model using machine learning methods with a hybrid of under sampling and oversampling for handling imbalanced datasets for high scores. In: Applied machine learning and data analytics: 5th international conference, AMLDA 2022, Reynosa, Tamaulipas, Mexico, December 22–23, 2022, Revised Selected Papers. Springer, pp 142–155

  10. Wongvorachan T, He S, Bulut O (2023) A comparison of undersampling, oversampling, and smote methods for dealing with imbalanced classification in educational data mining. Information 14(1):54

    Article  Google Scholar 

  11. Rathore SS, Chouhan SS, Jain DK, Vachhani AG (2022) Generative oversampling methods for handling imbalanced data in software fault prediction. IEEE Trans Reliab 71(2):747–762

    Article  Google Scholar 

  12. Dablain D, Krawczyk B, Chawla NV (2022) Deepsmote: fusing deep learning and smote for imbalanced data. IEEE Trans Neural Netw Learn Syst

  13. Zakariah M, AlQahtani SA, Al-Rakhami MS (2023) Machine learning-based adaptive synthetic sampling technique for intrusion detection. Appl Sci 13(11):6504

    Article  Google Scholar 

  14. Strelcenia E, Prakoonwit S (2023) Improving classification performance in credit card fraud detection by using new data augmentation. Artif Intell 4(1):172–198

    Google Scholar 

  15. Laakom F, Raitoharju J, Iosifidis A, Gabbouj M (2022) Reducing redundancy in the bottleneck representation of the autoencoders. arXiv preprint arXiv:2202.04629

  16. Takiddin A, Ismail M, Zafar U, Serpedin E (2022) Deep autoencoder-based anomaly detection of electricity theft cyberattacks in smart grids. IEEE Syst J 16(3):4106–4117

    Article  Google Scholar 

  17. Fanai H, Abbasimehr H (2023) A novel combined approach based on deep autoencoder and deep classifiers for credit card fraud detection. Expert Syst Appl 217:119562

    Article  Google Scholar 

  18. Zou J, Zhang J, Jiang P (2019) Credit card fraud detection using autoencoder neural network. arXiv preprint arXiv:1908.11553

  19. Tingfei H, Guangquan C, Kuihua H (2020) Using variational auto encoding in credit card fraud detection. IEEE Access 8:149841–149853

    Article  Google Scholar 

  20. Misra S, Thakur S, Ghosh M, Saha SK (2020) An autoencoder based model for detecting fraudulent credit card transaction. Proc Comput Sci 167:254–262

    Article  Google Scholar 

  21. Lin T-H, Jiang J-R (2021) Credit card fraud detection with autoencoder and probabilistic random forest. Mathematics 9(21):2683

    Article  Google Scholar 

  22. Zioviris G, Kolomvatsos K, Stamoulis G (2022) Credit card fraud detection using a deep learning multistage model. J Supercomput 78:1–26

    Article  Google Scholar 

  23. Du H, Lv L, Guo A, Wang H (2023) Autoencoder and lightgbm for credit card fraud detection problems. Symmetry 15(4):870

    Article  Google Scholar 

  24. Ding Y, Kang W, Feng J, Peng B, Yang A (2023) Credit card fraud detection based on improved variational autoencoder generative adversarial network. IEEE Access

  25. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  26. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  27. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114

  28. Rifai S, Mesnil G, Vincent P, Muller X, Bengio Y, Dauphin Y, Glorot X (2011) Higher order contractive auto-encoder. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2011, Athens, Greece, September 5–9, 2011, Proceedings, Part II 22. Springer, pp 645–660

  29. Alarfaj FK, Malik I, Khan HU, Almusallam N, Ramzan M, Ahmed M (2022) Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10:39700–39715

    Article  Google Scholar 

  30. Esenogho E, Mienye ID, Swart TG, Aruleba K, Obaido G (2022) A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10:16400–16407

    Article  Google Scholar 

Download references

Funding

No external funding was received for this study.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the conception and design of the study. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Mert Yılmaz Çakır.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Ethical approval

The authors confirm that this study was conducted in accordance with ethical principles and guidelines and that the appropriate ethics review was followed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Çakır, M.Y., Şirin, Y. Enhanced autoencoder-based fraud detection: a novel approach with noise factor encoding and SMOTE. Knowl Inf Syst 66, 635–652 (2024). https://doi.org/10.1007/s10115-023-02016-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-02016-z

Keywords

Navigation