Online active learning method for multi-class imbalanced data stream,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Online active learning method for multi-class imbalanced data stream
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2023-12-23 , DOI: 10.1007/s10115-023-02027-w
Ang Li , Meng Han , Dongliang Mu , Zhihui Gao , Shujuan Liu

In the field of data mining, data stream classification is an important research direction. However, the presence of issues such as multi-class imbalance, concept drift, and variable class imbalance ratio in data streams can greatly impact the performance of classification models, and the high cost of sample labeling has always been a focus of research. To address these problems, an online active learning method for multi-class imbalanced data stream (OALM-MI) is proposed. Firstly, a comprehensive sample weighting method based on cross-entropy and margin values is proposed to weight each incoming sample in the data stream according to its classification difficulty and importance, which aims to enhance the learning ability of the classifier for important samples. Besides, a comprehensive weighting and updating strategy for ensemble classifiers is introduced, which combines mean square error, improved square error, recall, and the weights of the classifiers in the previous sliding window of samples to weight and update the classifiers. Additionally, adaptive window is utilized to detect and handle concept drift, enabling better adaptation to the changes in the data stream during the learning process. Finally, a margin matrix label request strategy based on class imbalance ratio is proposed to assign labels to samples according to their imbalance ratio and classification difficulty, which can provide more learning opportunities for minority class samples and important samples. Comprehensive experiments were conducted on 12 synthetic data streams and six real data streams with seven state-of-the-art algorithms, and the results showed that the OALM-MI algorithm achieved the highest performance in terms of recall, precision, F1-score, Kappa, and G-mean.

中文翻译：

多类不平衡数据流的在线主动学习方法

在数据挖掘领域，数据流分类是一个重要的研究方向。然而，数据流中多类不平衡、概念漂移、可变类不平衡比例等问题的存在会极大地影响分类模型的性能，且样本标注成本高昂一直是研究的热点。针对这些问题，提出了一种多类不平衡数据流在线主动学习方法（OALM-MI）。首先，提出一种基于交叉熵和边缘值的综合样本加权方法，根据数据流中每个输入样本的分类难度和重要性对其进行加权，旨在增强分类器对重要样本的学习能力。此外，提出了一种集成分类器的综合加权和更新策略，该策略结合均方误差、改进平方误差、召回率以及样本在先前滑动窗口中的分类器权重来对分类器进行加权和更新。此外，自适应窗口用于检测和处理概念漂移，从而能够更好地适应学习过程中数据流的变化。最后，提出一种基于类别不平衡率的边缘矩阵标签请求策略，根据样本的不平衡率和分类难度为样本分配标签，可以为少数类样本和重要样本提供更多的学习机会。使用七种最先进的算法对 12 个合成数据流和 6 个真实数据流进行了综合实验，结果表明 OALM-MI 算法在召回率、精度、F1-score、 Kappa 和 G 均值。

更新日期：2023-12-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>