Document Set Expansion with Positive-Unlabelled Learning Using Intractable Density Estimation,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Document Set Expansion with Positive-Unlabelled Learning Using Intractable Density Estimation
arXiv - CS - Information Retrieval Pub Date : 2024-03-26 , DOI: arxiv-2403.17473
Haiyang Zhang, Qiuyi Chen, Yuanjie Zou, Yushan Pan, Jia Wang, Mark Stevenson

The Document Set Expansion (DSE) task involves identifying relevant documents from large collections based on a limited set of example documents. Previous research has highlighted Positive and Unlabeled (PU) learning as a promising approach for this task. However, most PU methods rely on the unrealistic assumption of knowing the class prior for positive samples in the collection. To address this limitation, this paper introduces a novel PU learning framework that utilizes intractable density estimation models. Experiments conducted on PubMed and Covid datasets in a transductive setting showcase the effectiveness of the proposed method for DSE. Code is available from https://github.com/Beautifuldog01/Document-set-expansion-puDE.

中文翻译：

使用棘手的密度估计通过正未标记学习进行文档集扩展

文档集扩展 (DSE) 任务涉及基于有限的示例文档集从大型集合中识别相关文档。先前的研究强调积极和无标签（PU）学习是完成这项任务的一种有前途的方法。然而，大多数 PU 方法依赖于先了解集合中正样本的类的不切实际的假设。为了解决这一限制，本文引入了一种新颖的 PU 学习框架，该框架利用棘手的密度估计模型。在转导环境中对 PubMed 和 Covid 数据集进行的实验展示了所提出的 DSE 方法的有效性。代码可从 https://github.com/Beautifuldog01/Document-set-expansion-puDE 获取。

更新日期：2024-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>