当前位置: X-MOL 学术Int. J. Uncertain. Fuzziness Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bio-Inspired Algorithm Based Undersampling Approach and Ensemble Learning for Twitter Spam Detection
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems ( IF 1.5 ) Pub Date : 2024-02-20 , DOI: 10.1142/s0218488524500016
K. Kiruthika Devi 1 , G. A. Sathish Kumar 2
Affiliation  

Currently, social media networks such as Facebook and Twitter have evolved into valuable platforms for global communication. However, due to their extensive user bases, Twitter is often misused by illegitimate users engaging in illicit activities. While there are numerous research papers available that delve into combating illegitimate users on Twitter, a common shortcoming in most of these works is the failure to address the issue of class imbalance, which significantly impacts the effectiveness of spam detection. Few other research works that have addressed class imbalance have not yet applied bio-inspired algorithms to balance the dataset. Therefore, we introduce PSOB-U, a particle swarm optimization-based undersampling technique designed to balance the Twitter dataset. In PSOB-U, various classifiers and metrics are employed to select majority samples and rank them. Furthermore, an ensemble learning approach is implemented to combine the base classifiers in three stages. During the training phase of the base classifiers, undersampling techniques and a cost-sensitive random forest (CS-RF) are utilized to address the imbalanced data at both the data and algorithmic levels. In the first stage, imbalanced datasets are balanced using random undersampling, particle swarm optimization-based undersampling, and random oversampling. In the second stage, a classifier is constructed for each of the balanced datasets obtained through these sampling techniques. In the third stage, a majority voting method is introduced to aggregate the predicted outputs from the three classifiers. The evaluation results demonstrate that our proposed method significantly enhances the detection of illegitimate users in the imbalanced Twitter dataset. Additionally, we compare our proposed work with existing models, and the predicted results highlight the superiority of our spam detection model over state-of-the-art spam detection models that address the class imbalance problem. The combination of particle swarm optimization-based undersampling and the ensemble learning approach using majority voting results in more accurate spam detection.



中文翻译:

基于仿生算法的 Twitter 垃圾邮件检测欠采样方法和集成学习

目前,Facebook和Twitter等社交媒体网络已发展成为全球交流的宝贵平台。然而,由于其广泛的用户群,Twitter 经常被非法用户滥用从事非法活动。尽管有大量研究论文深入探讨了如何打击 Twitter 上的非法用户,但大多数研究的一个共同缺点是未能解决类别不平衡问题,这严重影响了垃圾邮件检测的有效性。很少有其他解决类别不平衡问题的研究工作尚未应用生物启发算法来平衡数据集。因此,我们引入了 PSOB-U,一种基于粒子群优化的欠采样技术,旨在平衡 Twitter 数据集。在 PSOB-U 中,采用各种分类器和度量来选择大多数样本并对它们进行排序。此外,还实现了集成学习方法来将基分类器分三个阶段进行组合。在基分类器的训练阶段,利用欠采样技术和成本敏感的随机森林(CS-RF)来解决数据和算法层面上的不平衡数据。在第一阶段,使用随机欠采样、基于粒子群优化的欠采样和随机过采样来平衡不平衡数据集。在第二阶段,为通过这些采样技术获得的每个平衡数据集构建分类器。在第三阶段,引入多数投票方法来聚合三个分类器的预测输出。评估结果表明,我们提出的方法显着增强了不平衡 Twitter 数据集中非法用户的检测。此外,我们将我们提出的工作与现有模型进行比较,预测结果凸显了我们的垃圾邮件检测模型相对于解决类别不平衡问题的最先进的垃圾邮件检测模型的优越性。基于粒子群优化的欠采样与使用多数投票的集成学习方法相结合,可以实现更准确的垃圾邮件检测。

更新日期:2024-02-20
down
wechat
bug