当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Infectious risk events and their novelty in event-based surveillance: new definitions and annotated corpus
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2024-03-05 , DOI: 10.1007/s10579-024-09728-w
François Delon , Gabriel Bédubourg , Léo Bouscarrat , Jean-Baptiste Meynard , Aude Valois , Benjamin Queyriaux , Carlos Ramisch , Marc Tanti

Event-based surveillance (EBS) requires the analysis of an ever-increasing volume of documents, requiring automated processing to support human analysts. Few annotated corpora are available for the evaluation of information extraction tools in the EBS domain. The main objective of this work was to build a corpus containing documents which are representative of those collected in the current EBS information systems, and to annotate them with events and their novelty. We proposed new definitions of infectious events and their novelty suited to the background work of analysts working in the EBS domain, and we compiled a corpus of 305 documents describing 283 infectious events. There were 36 included documents in French, representing a total of 11 events, with the remainder in English. We annotated novelty for the 110 most recent documents in the corpus, resulting in 101 events. The inter-annotator agreement was 0.74 for event identification (F1-Score) and 0.69 [95% CI: 0.51; 0.88] (Kappa) for novelty annotation. The overall agreement for entity annotation was lower, with a significant variation according to the type of entities considered (range 0.30–0.68). This corpus is a useful tool for creating and evaluating algorithms and methods submitted by EBS research teams for event detection and annotation of their novelties, aiming at the operational improvement of document flow processing. The small size of this corpus makes it less suitable for training natural language processing models, although this limitation tends to fade away when using few-shots learning methods.



中文翻译:

传染性风险事件及其在基于事件的监测中的新颖性:新定义和带注释的语料库

基于事件的监视 (EBS) 需要分析不断增加的文档量,需要自动化处理来支持人类分析人员。很少有带注释的语料库可用于评估 EBS 领域的信息提取工具。这项工作的主要目标是建立一个包含代表当前 EBS 信息系统中收集的文档的语料库,并用事件及其新颖性对它们进行注释。我们提出了感染事件的新定义及其新颖性,适合 EBS 领域分析师的背景工作,并且我们编制了包含 305 个文档的语料库,描述了 283 个感染事件。其中包含 36 份法语文件,总共代表 11 个事件,其余为英语。我们为语料库中的 110 个最新文档注释了新颖性,产生了 101 个事件。事件识别 (F1-Score) 的注释者间一致性为 0.74,事件识别为 0.69 [95% CI: 0.51;0.88](Kappa)用于新颖性注释。实体注释的总体一致性较低,根据所考虑的实体类型存在显着差异(范围 0.30-0.68)。该语料库是创建和评估 EBS 研究团队提交的用于事件检测和新颖性注释的算法和方法的有用工具,旨在提高文档流处理的操作性。该语料库的规模较小,使其不太适合训练自然语言处理模型,尽管在使用少样本学习方法时,这种限制往往会消失。

更新日期:2024-03-05
down
wechat
bug