当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LSROM: Learning Self-Refined Organizing Map for Fast Imbalanced Streaming Data Clustering
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-04-14 , DOI: arxiv-2404.09243
Yongqi Xu, Yujian Lee, Rong Zou, Yiqun Zhang, Yiu-Ming Cheung

Streaming data clustering is a popular research topic in the fields of data mining and machine learning. Compared to static data, streaming data, which is usually analyzed in data chunks, is more susceptible to encountering the dynamic cluster imbalanced issue. That is, the imbalanced degree of clusters varies in different streaming data chunks, leading to corruption in either the accuracy or the efficiency of streaming data analysis based on existing clustering methods. Therefore, we propose an efficient approach called Learning Self-Refined Organizing Map (LSROM) to handle the imbalanced streaming data clustering problem, where we propose an advanced SOM for representing the global data distribution. The constructed SOM is first refined for guiding the partition of the dataset to form many micro-clusters to avoid the missing small clusters in imbalanced data. Then an efficient merging of the micro-clusters is conducted through quick retrieval based on the SOM, which can automatically yield a true number of imbalanced clusters. In comparison to existing imbalanced data clustering approaches, LSROM is with a lower time complexity $O(n\log n)$, while achieving very competitive clustering accuracy. Moreover, LSROM is interpretable and insensitive to hyper-parameters. Extensive experiments have verified its efficacy.

中文翻译:

LSROM:学习自完善组织图以实现快速不平衡流数据聚类

流数据聚类是数据挖掘和机器学习领域的热门研究课题。与静态数据相比,通常以数据块进行分析的流数据更容易遇到动态集群不平衡问题。也就是说,不同流数据块中聚类的不平衡程度不同,导致基于现有聚类方法的流数据分析的准确性或效率降低。因此,我们提出了一种称为学习自精炼组织图(LSROM)的有效方法来处理不平衡流数据聚类问题,其中我们提出了一种先进的 SOM 来表示全局数据分布。首先对构建的SOM进行细化,以指导数据集的划分形成许多微簇,以避免不平衡数据中丢失小簇。然后通过基于SOM的快速检索对微簇进行有效的合并,可以自动产生真实数量的不平衡簇。与现有的不平衡数据聚类方法相比,LSROM 具有较低的时间复杂度 $O(n\log n)$,同时实现了非常有竞争力的聚类精度。此外,LSROM 是可解释的并且对超参数不敏感。大量的实验已经验证了其功效。
更新日期:2024-04-16
down
wechat
bug