A span-based model for extracting overlapping PICO entities from RCT publications,Journal of the American Medical Informatics Association

当前位置： X-MOL 学术 › J. Am. Med. Inform. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A span-based model for extracting overlapping PICO entities from RCT publications
Journal of the American Medical Informatics Association ( IF 6.4 ) Pub Date : 2024-03-12 , DOI: 10.1093/jamia/ocae065
Gongbo Zhang ₁ , Yiliang Zhou ₂ , Yan Hu ₃ , Hua Xu ₃ , Chunhua Weng ₁ , Yifan Peng ₂

Affiliation

Objectives Extracting PICO (Populations, Interventions, Comparison, and Outcomes) entities is fundamental to evidence retrieval. We present a novel method, PICOX, to extract overlapping PICO entities. Materials and Methods PICOX first identifies entities by assessing whether a word marks the beginning or conclusion of an entity. Then, it uses a multi-label classifier to assign one or more PICO labels to a span candidate. PICOX was evaluated using one of the best-performing baselines, EBM-NLP, and three more datasets, ie, PICO-Corpus and RCT publications on Alzheimer’s Disease or COVID-19, using entity-level precision, recall, and F1 scores. Results PICOX achieved superior precision, recall, and F1 scores across the board, with the micro F1 score improving from 45.05 to 50.87 (p ≪ .01). On the PICO-Corpus, PICOX obtained higher recall and F1 scores than the baseline and improved the micro recall score from 56.66 to 67.33. On the COVID-19 dataset, PICOX also outperformed the baseline and improved the micro F1 score from 77.10 to 80.32. On the AD dataset, PICOX demonstrated comparable F1 scores with higher precision when compared to the baseline. Conclusion PICOX excels in identifying overlapping entities and consistently surpasses a leading baseline across multiple datasets. Ablation studies reveal that its data augmentation strategy effectively minimizes false positives and improves precision.

中文翻译：

一种基于跨度的模型，用于从 RCT 出版物中提取重叠的 PICO 实体

目标提取 PICO（群体、干预、比较和结果）实体是证据检索的基础。我们提出了一种新方法 PICOX 来提取重叠的 PICO 实体。材料和方法 PICOX 首先通过评估单词是否标志着实体的开始或结束来识别实体。然后，它使用多标签分类器将一个或多个 PICO 标签分配给跨度候选者。使用性能最好的基线之一 EBM-NLP 和另外三个数据集（即 PICO-Corpus 和关于阿尔茨海默病或 COVID-19 的 RCT 出版物），使用实体级精度、召回率和 F1 分数对 PICOX 进行评估。结果 PICOX 全面实现了卓越的精确度、召回率和 F1 分数，微 F1 分数从 45.05 提高到 50.87 (p < .01)。在 PICO-Corpus 上，PICOX 获得了比基线更高的召回率和 F1 分数，并将微召回率从 56.66 提高到 67.33。在 COVID-19 数据集上，PICOX 的表现也优于基线，并将微型 F1 分数从 77.10 提高到 80.32。在 AD 数据集上，PICOX 表现出与基线相比具有更高精度的可比 F1 分数。结论 PICOX 擅长识别重叠实体，并且在多个数据集中始终超越领先基线。消融研究表明，其数据增强策略有效地减少了误报并提高了精度。

更新日期：2024-03-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>