Classification of Imbalanced Data Using SMOTE and AutoEncoder Based Deep Convolutional Neural Network,International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems

当前位置： X-MOL 学术 › Int. J. Uncertain. Fuzziness Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Classification of Imbalanced Data Using SMOTE and AutoEncoder Based Deep Convolutional Neural Network
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems ( IF 1.5 ) Pub Date : 2023-07-03 , DOI: 10.1142/s0218488523500228
Suja A. Alex ₁ , J. Jesu Vedha Nayahi ₂

Affiliation

The imbalanced data classification is a challenging issue in many domains including medical intelligent diagnosis and fraudulent transaction analysis. The performance of the conventional classifier degrades due to the imbalanced class distribution of the training data set. Recently, machine learning and deep learning techniques are used for imbalanced data classification. Data preprocessing approaches are also suitable for handling class imbalance problem. Data augmentation is one of the preprocessing techniques used to handle skewed class distribution. Synthetic Minority Oversampling Technique (SMOTE) is a promising class balancing approach and it generates noise during the process of creation of synthetic samples. In this paper, AutoEncoder is used as a noise reduction technique and it reduces the noise generated by SMOTE. Further, Deep one-dimensional Convolutional Neural Network is used for classification. The performance of the proposed method is evaluated and compared with existing approaches using different metrics such as Precision, Recall, Accuracy, Area Under the Curve and Geometric Mean. Ten data sets with imbalance ratio ranging from 1.17 to 577.87 and data set size ranging from 303 to 284807 instances are used in the experiments. The different imbalanced data sets used are Heart-Disease, Mammography, Pima Indian diabetes, Adult, Oil-Spill, Phoneme, Creditcard, BankNoteAuthentication, Balance scale weight & distance database and Yeast data sets. The proposed method shows an accuracy of 96.1%, 96.5%, 87.7%, 87.3%, 95%, 92.4%, 98.4%, 86.1%, 94% and 95.9% respectively. The results suggest that this method outperforms other deep learning methods and machine learning methods with respect to G-mean and other performance metrics.

中文翻译：

使用基于 SMOTE 和 AutoEncoder 的深度卷积神经网络对不平衡数据进行分类

不平衡的数据分类在医疗智能诊断和欺诈交易分析等许多领域都是一个具有挑战性的问题。由于训练数据集的类别分布不平衡，传统分类器的性能下降。最近，机器学习和深度学习技术被用于不平衡数据分类。数据预处理方法也适用于处理类别不平衡问题。数据增强是用于处理倾斜类分布的预处理技术之一。合成少数过采样技术（SMOTE）是一种很有前途的类平衡方法，它在创建合成样本的过程中会产生噪声。在本文中，AutoEncoder 被用作降噪技术，它减少了 SMOTE 产生的噪声。更远，深度一维卷积神经网络用于分类。使用不同的指标（例如精度、召回率、准确度、曲线下面积和几何平均值）对所提出方法的性能进行评估并与现有方法进行比较。实验中使用了10个不平衡率范围为1.17至577.87、数据集大小范围为303至284807个实例的数据集。使用的不同不平衡数据集是心脏病、乳房X光检查、皮马印第安糖尿病、成人、漏油、音素、信用卡、BankNoteAuthentication、天平体重和距离数据库以及酵母数据集。该方法的准确率分别为96.1%、96.5%、87.7%、87.3%、95%、92.4%、98.4%、86.1%、94%和95.9%。

更新日期：2023-07-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>