当前位置: X-MOL 学术Archives of Suicide Research › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Suicidality Detection on Social Media Using Metadata and Text Feature Extraction and Machine Learning
Archives of Suicide Research ( IF 2.833 ) Pub Date : 2021-07-28 , DOI: 10.1080/13811118.2021.1955783
Woojin Jung , Donghun Kim , Seojin Nam , Yongjun Zhu

Abstract

In this study, we implemented machine learning models that can detect suicidality posts on Twitter. We randomly selected and annotated 20,000 tweets and explored metadata and text features to build effective models. Metadata features were studied in great details to understand their possibility and importance in suicidality detection models. Results showed that posting type (i.e., reply or not) and time-related features such as the month, day of the week, and the time (AM vs. PM) were the most important metadata features in suicidality detection models. Specifically, the probability of a social media post being suicidal is higher if the post is a reply to other users rather than an original tweet. Moreover, tweets created in the afternoon, on Fridays and weekends, and in fall have higher probabilities of being detected as suicidality tweets compared with those created in other times. By integrating metadata and text features, we obtained a model of good performance (i.e., F1 score of 0.846) that can assist humans in the real-world setting to detect suicidality social media posts.



中文翻译:

使用元数据和文本特征提取和机器学习对社交媒体进行自杀检测

摘要

在这项研究中,我们实施了机器学习模型,可以检测 Twitter 上的自杀帖子。我们随机选择并注释了 20,000 条推文,并探索了元数据和文本特征以构建有效的模型。对元数据特征进行了详细研究,以了解它们在自杀检测模型中的可能性和重要性。结果表明,发帖类型(即是否回复)和与时间相关的特征,例如月份、星期几和时间(上午与下午)是自杀检测模型中最重要的元数据特征。具体来说,如果帖子是对其他用户的回复而不是原始推文,则社交媒体帖子自杀的可能性更高。此外,在下午、周五和周末创建的推文,与其他时间创建的推文相比,秋季发布的推文被检测为自杀推文的可能性更高。通过整合元数据和文本特征,我们获得了一个性能良好的模型(即 F1 分数为 0.846),可以帮助人类在现实世界中检测自杀社交媒体帖子。

更新日期:2021-07-28
down
wechat
bug