当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Annotator-Centric Active Learning for Subjective NLP Tasks
arXiv - CS - Computation and Language Pub Date : 2024-04-24 , DOI: arxiv-2404.15720
Michiel van der Meer, Neele Falk, Pradeep K. Murukannaiah, Enrico Liscio

To accurately capture the variability in human judgments for subjective NLP tasks, incorporating a wide range of perspectives in the annotation process is crucial. Active Learning (AL) addresses the high costs of collecting human annotations by strategically annotating the most informative samples. We introduce Annotator-Centric Active Learning (ACAL), which incorporates an annotator selection strategy following data sampling. Our objective is two-fold: (1) to efficiently approximate the full diversity of human judgments, and to assess model performance using annotator-centric metrics, which emphasize minority perspectives over a majority. We experiment with multiple annotator selection strategies across seven subjective NLP tasks, employing both traditional and novel, human-centered evaluation metrics. Our findings indicate that ACAL improves data efficiency and excels in annotator-centric performance evaluations. However, its success depends on the availability of a sufficiently large and diverse pool of annotators to sample from.


针对主观 NLP 任务的以标注者为中心的主动学习

为了准确捕捉人类对主观 NLP 任务判断的可变性,在注释过程中纳入广泛的观点至关重要。主动学习 (AL) 通过策略性地注释最具信息量的样本来解决收集人工注释的高成本问题。我们引入了以注释器为中心的主动学习(ACAL),它结合了数据采样后的注释器选择策略。我们的目标有两个:(1)有效地近似人类判断的全部多样性,并使用以注释者为中心的指标评估模型性能,这些指标强调少数人的观点而不是多数人的观点。我们在七个主观 NLP 任务中尝试了多种注释器选择策略,采用传统的和新颖的、以人为中心的评估指标。我们的研究结果表明,ACAL 提高了数据效率,并且在以注释者为中心的性能评估方面表现出色。然而,它的成功取决于是否有足够大且多样化的注释者池可供采样。