当前位置: X-MOL 学术ACM Trans. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Passage-aware Search Result Diversification
ACM Transactions on Information Systems ( IF 5.6 ) Pub Date : 2024-03-21 , DOI: 10.1145/3653672
Zhan Su 1 , Zhicheng Dou 2 , Yutao Zhu 2 , Ji-Rong Wen 3
Affiliation  

Research on search result diversification strives to enhance the variety of subtopics within the list of search results. Existing studies usually treat a document as a whole and represent it with one fixed-length vector. However, considering that a long document could cover different aspects of a query, using a single vector to represent the document is usually insufficient. To tackle this problem, we propose to exploit multiple passages to better represent documents in search result diversification. Different passages of each document may reflect different subtopics of the query and comparison among the passages can improve result diversity. Specifically, we segment the entire document into multiple passages and train a classifier to filter out the irrelevant ones. Then the document diversity is measured based on several passages that can offer the information needs of the query. Thereafter, we devise a passage-aware search result diversification framework that takes into account the topic information contained in the selected document sequence and candidate documents. The candidate documents’ novelty is evaluated based on their passages while considering the dynamically selected document sequence. We conducted experiments on a commonly utilized dataset, and the results indicate that our proposed method performs better than the most leading methods.



中文翻译:

段落感知搜索结果多样化

关于搜索结果多样化的研究致力于增强搜索结果列表中子主题的多样性。现有的研究通常将文档视为一个整体,并用一个固定长度的向量来表示它。然而,考虑到长文档可能涵盖查询的不同方面,使用单个向量来表示文档通常是不够的。为了解决这个问题,我们建议利用多个段落来更好地表示搜索结果多样化中的文档。每个文档的不同段落可能反映查询的不同子主题,并且段落之间的比较可以提高结果的多样性。具体来说,我们将整个文档分割成多个段落,并训练分类器来过滤掉不相关的段落。然后根据可以提供查询信息需求的几个段落来测量文档多样性。此后,我们设计了一个段落感知的搜索结果多样化框架,该框架考虑了所选文档序列和候选文档中包含的主题信息。候选文档的新颖性是根据候选文档的段落来评估的,同时考虑动态选择的文档序列。我们在常用的数据集上进行了实验,结果表明我们提出的方法比最领先的方法表现更好。

更新日期:2024-03-24
down
wechat
bug