当前位置: X-MOL 学术J. Inf. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Utilising crowdsourcing and text mining to enhance information extraction from social media: A case study in handling COVID-19 supply requests in Thailand
Journal of Information Science ( IF 2.4 ) Pub Date : 2024-01-06 , DOI: 10.1177/01655515231220164
Prapaporn Rattanatamrong 1 , Yutthana Boonpalit 1 , Manassanan Boonnavasin 1
Affiliation  

Social media platforms are critical for disaster communication and relief efforts. Rapid and precise social media post analysis is required for effective disaster response. This article presents a comprehensive study of a framework that combines crowdsourcing and text mining techniques to enhance data extraction from social media. The research focuses on a particular case study of COVID-19 pandemic medical supply request, which shows several key findings. First, the incorporation of domain-specific data during the training of named entity recognition (NER) models is essential for accurately identifying and retrieving important entities, such as the names of medical supplies and hospitals. Second, the implementation of a hybrid system leads to improvement in the extraction of information from social media posts. Finally, the involvement of crowdsourcing is found to be significant in the validation, verification, and filtering of disorganised information within the hybrid system. Our performance analysis demonstrates that the use of hybrid models has the potential to significantly improve the extraction of supply names (by up to 37%) and hospital names (by up to 66%), especially in the absence of a comprehensive vocabulary or specially trained NER models. During the COVID-19 supply shortage in Thailand, volunteers utilised hybrid models to expedite the identification of the necessary information. Experiment results demonstrated significant improvement in the accuracy of extracted data, the ability to acquire relevant information in real-time, the capacity to handle a substantial number of posts and the practical benefit of the proposed framework.

中文翻译:

利用众包和文本挖掘增强社交媒体的信息提取:泰国处理 COVID-19 供应请求的案例研究

社交媒体平台对于灾害沟通和救援工作至关重要。有效的灾难应对需要快速、准确的社交媒体事后分析。本文对结合众包和文本挖掘技术以增强社交媒体数据提取的框架进行了全面研究。该研究重点关注 COVID-19 大流行医疗供应请求的特定案例研究,其中显示了几个关键发现。首先,在命名实体识别(NER)模型的训练过程中结合特定领域的数据对于准确识别和检索重要实体(例如医疗用品和医院的名称)至关重要。其次,混合系统的实施可以改善从社交媒体帖子中提取信息的能力。最后,人们发现众包的参与对于混合系统中杂乱信息的验证、验证和过滤具有重要意义。我们的性能分析表明,使用混合模型有可能显着改善供应名称(高达 37%)和医院名称(高达 66%)的提取,特别是在缺乏全面词汇或经过专门培训的情况下NER 模型。在泰国发生 COVID-19 供应短缺期间,志愿者利用混合模型来加快识别必要信息。实验结果表明,所提框架在提取数据的准确性、实时获取相关信息的能力、处理大量帖子的能力以及实际效益方面都有显着提高。
更新日期:2024-01-06
down
wechat
bug