当前位置: X-MOL 学术Inf. Syst. Front. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparing Machine Learning and Deep Learning Techniques for Text Analytics: Detecting the Severity of Hate Comments Online
Information Systems Frontiers ( IF 5.9 ) Pub Date : 2023-11-24 , DOI: 10.1007/s10796-023-10446-x
Alaa Marshan , Farah Nasreen Mohamed Nizar , Athina Ioannou , Konstantina Spanaki

Social media platforms have become an increasingly popular tool for individuals to share their thoughts and opinions with other people. However, very often people tend to misuse social media posting abusive comments. Abusive and harassing behaviours can have adverse effects on people's lives. This study takes a novel approach to combat harassment in online platforms by detecting the severity of abusive comments, that has not been investigated before. The study compares the performance of machine learning models such as Naïve Bayes, Random Forest, and Support Vector Machine, with deep learning models such as Convolutional Neural Network (CNN) and Bi-directional Long Short-Term Memory (Bi-LSTM). Moreover, in this work we investigate the effect of text pre-processing on the performance of the machine and deep learning models, the feature set for the abusive comments was made using unigrams and bigrams for the machine learning models and word embeddings for the deep learning models. The comparison of the models’ performances showed that the Random Forest with bigrams achieved the best overall performance with an accuracy of (0.94), a precision of (0.91), a recall of (0.94), and an F1 score of (0.92). The study develops an efficient model to detect severity of abusive language in online platforms, offering important implications both to theory and practice.



中文翻译:

比较文本分析的机器学习和深度学习技术:检测在线仇恨评论的严重性

社交媒体平台已成为个人与其他人分享想法和意见的越来越流行的工具。然而,人们常常倾向于滥用社交媒体发布辱骂性评论。辱骂和骚扰行为会对人们的生活产生不利影响。这项研究采用了一种新颖的方法来打击在线平台上的骚扰,即检测辱骂性评论的严重程度,而这种方法以前从未被调查过。该研究将朴素贝叶斯、随机森林和支持向量机等机器学习模型与卷积神经网络 (CNN) 和双向长短期记忆 (Bi-LSTM) 等深度学习模型的性能进行了比较。此外,在这项工作中,我们研究了文本预处理对机器和深度学习模型性能的影响,辱骂性评论的特征集是使用机器学习模型的一元组和二元组以及深度学习的词嵌入来制作的楷模。模型性能的比较表明,具有二元组的随机森林取得了最佳的整体性能,准确率为(0.94),精确度为(0.91),召回率为(0.94),F1分数为(0.92)。该研究开发了一种有效的模型来检测在线平台中辱骂性语言的严重程度,为理论和实践提供了重要的启示。

更新日期:2023-11-26
down
wechat
bug