当前位置: X-MOL 学术Exp. Biol. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A quantum-based oversampling method for classification of highly imbalanced and overlapped data
Experimental Biology and Medicine ( IF 3.2 ) Pub Date : 2024-03-01 , DOI: 10.1177/15353702231220665
Bei Yang 1 , Guilan Tian 1 , Joseph Luttrell 2 , Ping Gong 3 , Chaoyang Zhang 2
Affiliation  

Data imbalance is a challenging problem in classification tasks, and when combined with class overlapping, it further deteriorates classification performance. However, existing studies have rarely addressed both issues simultaneously. In this article, we propose a novel quantum-based oversampling method (QOSM) to effectively tackle data imbalance and class overlapping, thereby improving classification performance. QOSM utilizes the quantum potential theory to calculate the potential energy of each sample and selects the sample with the lowest potential as the center of each cover generated by a constructive covering algorithm. This approach optimizes cover center selection and better captures the distribution of the original samples, particularly in the overlapping regions. In addition, oversampling is performed on the samples of the minority class covers to mitigate the imbalance ratio (IR). We evaluated QOSM using three traditional classifiers (support vector machines [SVM], k-nearest neighbor [KNN], and naive Bayes [NB] classifier) on 10 publicly available KEEL data sets characterized by high IRs and varying degrees of overlap. Experimental results demonstrate that QOSM significantly improves classification accuracy compared to approaches that do not address class imbalance and overlapping. Moreover, QOSM consistently outperforms existing oversampling methods tested. With its compatibility with different classifiers, QOSM exhibits promising potential to improve the classification performance of highly imbalanced and overlapped data.

中文翻译:

一种基于量子的过采样方法,用于高度不平衡和重叠数据的分类

数据不平衡是分类任务中的一个具有挑战性的问题,当与类别重叠相结合时,它会进一步恶化分类性能。然而,现有的研究很少同时解决这两个问题。在本文中,我们提出了一种新颖的基于量子的过采样方法(QOSM)来有效解决数据不平衡和类别重叠,从而提高分类性能。QOSM利用量子势理论计算每个样本的势能,并选择势能最低的样本作为构造性覆盖算法生成的每个覆盖的中心。这种方法优化了覆盖中心的选择,更好地捕获原始样本的分布,特别是在重叠区域。此外,对少数类覆盖的样本进行过采样,以减轻不平衡率(IR)。我们使用三种传统分类器(支持向量机 [SVM]、k 最近邻 [KNN] 和朴素贝叶斯 [NB] 分类器)在 10 个公开可用的 KEEL 数据集上评估 QOSM,这些数据集具有高 IR 和不同程度的重叠特征。实验结果表明,与不解决类别不平衡和重叠问题的方法相比,QOSM 显着提高了分类准确性。此外,QOSM 始终优于已测试的现有过采样方法。由于其与不同分类器的兼容性,QOSM 在提高高度不平衡和重叠数据的分类性能方面表现出巨大的潜力。
更新日期:2024-03-01
down
wechat
bug