当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RankMamba, Benchmarking Mamba's Document Ranking Performance in the Era of Transformers
arXiv - CS - Information Retrieval Pub Date : 2024-03-27 , DOI: arxiv-2403.18276
Zhichao Xu

Transformer structure has achieved great success in multiple applied machine learning communities, such as natural language processing (NLP), computer vision (CV) and information retrieval (IR). Transformer architecture's core mechanism -- attention requires $O(n^2)$ time complexity in training and $O(n)$ time complexity in inference. Many works have been proposed to improve the attention mechanism's scalability, such as Flash Attention and Multi-query Attention. A different line of work aims to design new mechanisms to replace attention. Recently, a notable model structure -- Mamba, which is based on state space models, has achieved transformer-equivalent performance in multiple sequence modeling tasks. In this work, we examine \mamba's efficacy through the lens of a classical IR task -- document ranking. A reranker model takes a query and a document as input, and predicts a scalar relevance score. This task demands the language model's ability to comprehend lengthy contextual inputs and to capture the interaction between query and document tokens. We find that (1) Mamba models achieve competitive performance compared to transformer-based models with the same training recipe; (2) but also have a lower training throughput in comparison to efficient transformer implementations such as flash attention. We hope this study can serve as a starting point to explore Mamba models in other classical IR tasks. Our code implementation and trained checkpoints are made public to facilitate reproducibility.\footnote{https://github.com/zhichaoxu-shufe/RankMamba}.

中文翻译:

RankMamba,对标变形金刚时代 Mamba 的文档排名表现

Transformer 结构在自然语言处理(NLP)、计算机视觉(CV)和信息检索(IR)等多个应用机器学习社区中取得了巨大成功。 Transformer 架构的核心机制——注意力机制需要训练时的 $O(n^2)$ 时间复杂度和推理时的 $O(n)$ 时间复杂度。人们提出了许多工作来提高注意力机制的可扩展性,例如 Flash Attention 和 Multi-query Attention。另一项工作旨在设计新的机制来取代注意力。最近,一种著名的基于状态空间模型的模型结构——Mamba,在多个序列建模任务中取得了与 Transformer 相当的性能。在这项工作中,我们通过经典 IR 任务(文档排名)的视角来检验 \mamba 的功效。重排序模型将查询和文档作为输入,并预测标量相关性得分。此任务要求语言模型能够理解冗长的上下文输入并捕获查询和文档标记之间的交互。我们发现 (1) 与具有相同训练方案的基于 Transformer 的模型相比,Mamba 模型取得了有竞争力的性能; (2) 但与高效的 Transformer 实现(例如 Flash Attention)相比,训练吞吐量也较低。我们希望这项研究能够作为探索其他经典 IR 任务中的 Mamba 模型的起点。我们的代码实现和经过训练的检查点是公开的,以促进可重复性。\footnote{https://github.com/zhichaoxu-shufe/RankMamba}。
更新日期:2024-03-28
down
wechat
bug