An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models
arXiv - CS - Information Retrieval Pub Date : 2024-03-20 , DOI: arxiv-2403.13291
Qi Liu, Gang Guo, Jiaxin Mao, Zhicheng Dou, Ji-Rong Wen, Hao Jiang, Xinyu Zhang, Zhao Cao

With the development of pre-trained language models, the dense retrieval models have become promising alternatives to the traditional retrieval models that rely on exact match and sparse bag-of-words representations. Different from most dense retrieval models using a bi-encoder to encode each query or document into a dense vector, the recently proposed late-interaction multi-vector models (i.e., ColBERT and COIL) achieve state-of-the-art retrieval effectiveness by using all token embeddings to represent documents and queries and modeling their relevance with a sum-of-max operation. However, these fine-grained representations may cause unacceptable storage overhead for practical search systems. In this study, we systematically analyze the matching mechanism of these late-interaction models and show that the sum-of-max operation heavily relies on the co-occurrence signals and some important words in the document. Based on these findings, we then propose several simple document pruning methods to reduce the storage overhead and compare the effectiveness of different pruning methods on different late-interaction models. We also leverage query pruning methods to further reduce the retrieval latency. We conduct extensive experiments on both in-domain and out-domain datasets and show that some of the used pruning methods can significantly improve the efficiency of these late-interaction models without substantially hurting their retrieval effectiveness.

中文翻译：

后期交互模型的匹配机制和令牌剪枝分析

随着预训练语言模型的发展，密集检索模型已成为依赖精确匹配和稀疏词袋表示的传统检索模型的有希望的替代品。与大多数使用双编码器将每个查询或文档编码为密集向量的密集检索模型不同，最近提出的后期交互多向量模型（即 ColBERT 和 COIL）通过以下方式实现了最先进的检索有效性：使用所有标记嵌入来表示文档和查询，并通过最大和运算对它们的相关性进行建模。然而，这些细粒度的表示可能会导致实际搜索系统无法接受的存储开销。在本研究中，我们系统地分析了这些后期交互模型的匹配机制，并表明最大和运算严重依赖于共现信号和文档中的一些重要单词。基于这些发现，我们提出了几种简单的文档剪枝方法来减少存储开销，并比较不同剪枝方法在不同后期交互模型上的有效性。我们还利用查询修剪方法来进一步减少检索延迟。我们对域内和域外数据集进行了广泛的实验，结果表明，所使用的一些剪枝方法可以显着提高这些后期交互模型的效率，而不会显着损害其检索效果。

更新日期：2024-03-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>