Abstract
With the development of pre-trained language models, the dense retrieval models have become promising alternatives to the traditional retrieval models that rely on exact match and sparse bag-of-words representations. Different from most dense retrieval models using a bi-encoder to encode each query or document into a dense vector, the recently proposed late-interaction multi-vector models (i.e., ColBERT and COIL) achieve state-of-the-art retrieval effectiveness by using all token embeddings to represent documents and queries and modeling their relevance with a sum-of-max operation. However, these fine-grained representations may cause unacceptable storage overhead for practical search systems. In this study, we systematically analyze the matching mechanism of these late-interaction models and show that the sum-of-max operation heavily relies on the co-occurrence signals and some important words in the document. Based on these findings, we then propose several simple document pruning methods to reduce the storage overhead and compare the effectiveness of different pruning methods on different late-interaction models. We also leverage query pruning methods to further reduce the retrieval latency. We conduct extensive experiments on both in-domain and out-domain datasets and show that some of the used pruning methods can significantly improve the efficiency of these late-interaction models without substantially hurting their retrieval effectiveness.
- [1] . 2020. Pre-indexing pruning strategies. In International Symposium on String Processing and Information Retrieval. Springer, 177–193.Google ScholarDigital Library
- [2] . 2012. Static index pruning in web search engines: Combining term and document popularities with query views. ACM Transactions on Information Systems (TOIS) 30, 1 (2012), 1–28.Google ScholarDigital Library
- [3] . 2008. Design trade-offs for search engine caching. ACM Transactions on the Web (TWEB) 2, 4 (2008), 1–28.Google ScholarDigital Library
- [4] . 2020. SparTerm: Learning term-based sparse representation for fast text retrieval. arXiv preprint arXiv:2010.00768 (2020).Google Scholar
- [5] Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2018. MS MARCO: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2018).Google Scholar
- [6] . 2007. Static pruning of terms in inverted files. In Advances in Information Retrieval: 29th European Conference on IR Research, ECIR 2007, Rome, Italy, April 2-5, 2007. Proceedings 29. Springer, 64–75.Google ScholarCross Ref
- [7] . 2010. Probabilistic static pruning of inverted files. ACM Transactions on Information Systems (TOIS) 28, 1 (2010), 1–33.Google ScholarDigital Library
- [8] . 2006. A document-centric approach to static index pruning in text retrieval systems. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 182–189.Google ScholarDigital Library
- [9] . 2001. Static index pruning for information retrieval systems. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 43–50.Google ScholarDigital Library
- [10] . 2019. Context-aware sentence/passage term importance estimation for first stage retrieval. arXiv preprint arXiv:1910.10687 (2019).Google Scholar
- [11] . 2020. Context-aware document term weighting for ad-hoc search. In Proceedings of The Web Conference 2020. 1897–1907.Google ScholarDigital Library
- [12] . 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- [13] . 2021. SPLADE v2: Sparse lexical and expansion model for information retrieval. arXiv preprint arXiv:2109.10086 (2021).Google Scholar
- [14] . 2021. SPLADE: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2288–2292.Google ScholarDigital Library
- [15] . 2021. A white box analysis of ColBERT. In European Conference on Information Retrieval. Springer, 257–263.Google ScholarDigital Library
- [16] . 1987. The vocabulary problem in human-system communication. Commun. ACM 30, 11 (1987), 964–971.Google ScholarDigital Library
- [17] . 2021. COIL: Revisit exact lexical match in information retrieval with contextualized inverted list. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3030–3042.Google ScholarCross Ref
- [18] . 2022. Introducing neural bag of whole-words with ColBERTer: Contextualized late interactions using enhanced reduction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 737–747.Google ScholarDigital Library
- [19] . 2020. Embedding-based retrieval in Facebook search. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2553–2561.Google ScholarDigital Library
- [20] . 2019. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In International Conference on Learning Representations.Google Scholar
- [21] . 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019).Google ScholarCross Ref
- [22] . 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6769–6781.Google ScholarCross Ref
- [23] . 2020. ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 39–48.Google ScholarDigital Library
- [24] . 2019. ALBERT: A lite BERT for self-supervised learning of language representations. In International Conference on Learning Representations.Google Scholar
- [25] . 2023. A static pruning study on sparse neural retrievers. arXiv preprint arXiv:2304.12702 (2023).Google Scholar
- [26] . 2021. A study on token pruning for ColBERT. arXiv preprint arXiv:2112.06540 (2021).Google Scholar
- [27] . 2021. A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques. arXiv preprint arXiv:2106.14807 (2021).Google Scholar
- [28] . 2021. Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 2356–2362.Google ScholarDigital Library
- [29] . 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
- [30] . 2021. Sparse, dense, and attentional representations for text retrieval. Transactions of the Association for Computational Linguistics 9 (2021), 329–345.Google ScholarCross Ref
- [31] . 2021. Learning passage impacts for inverted indexes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1723–1727.Google ScholarDigital Library
- [32] . 2019. Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019).Google Scholar
- [33] . 2019. From doc2query to docTTTTTquery. Online preprint (2019).Google Scholar
- [34] . 2019. Document expansion by query prediction. arXiv preprint arXiv:1904.08375 (2019).Google Scholar
- [35] . 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 275–281.Google ScholarDigital Library
- [36] . 2022. Multi-vector retrieval as sparse alignment. arXiv preprint arXiv:2211.01267 (2022).Google Scholar
- [37] . 2019. Understanding the behaviors of BERT in ranking. arXiv preprint arXiv:1904.07531 (2019).Google Scholar
- [38] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google Scholar
- [39] . 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).Google Scholar
- [40] . 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.Google ScholarDigital Library
- [41] . 2022. ColBERTv2: Effective and efficient retrieval via lightweight late interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3715–3734.Google ScholarCross Ref
- [42] . 1987. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics 15 (1987), 657–680.Google ScholarCross Ref
- [43] . 2021. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).Google Scholar
- [44] . 2011. Within-document term-based index pruning with statistical hypothesis testing. In Advances in Information Retrieval: 33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, April 18-21, 2011. Proceedings 33. Springer, 543–554.Google ScholarDigital Library
- [45] . 2021. Query embedding pruning for dense retrieval. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3453–3457.Google ScholarDigital Library
- [46] . 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In International Conference on Learning Representations.Google Scholar
- [47] . 2019. XLNet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
- [48] . 2021. Optimizing dense retrieval model training with hard negatives. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1503–1512.Google ScholarDigital Library
- [49] . 2020. An analysis of BERT in document ranking. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1941–1944.Google ScholarDigital Library
- [50] . 2020. RepBERT: Contextualized text embeddings for first-stage retrieval. arXiv preprint arXiv:2006.15498 (2020).Google Scholar
Index Terms
- An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models
Recommendations
Learned Token Pruning in Contextualized Late Interaction over BERT (ColBERT)
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalBERT-based rankers have been shown very effective as rerankers in information retrieval tasks. In order to extend these models to full-ranking scenarios, the ColBERT model has been recently proposed, which adopts a late interaction mechanism. This ...
Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningDeploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length. In this work, we propose a constraint-aware and ...
Query Embedding Pruning for Dense Retrieval
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementRecent advances in dense retrieval techniques have offered the promise of being able not just to re-rank documents using contextualised language models such as BERT, but also to use such models to identify documents from the collection in the first ...
Comments