skip to main content
research-article

An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models

Authors Info & Claims
Published:29 April 2024Publication History
Skip Abstract Section

Abstract

With the development of pre-trained language models, the dense retrieval models have become promising alternatives to the traditional retrieval models that rely on exact match and sparse bag-of-words representations. Different from most dense retrieval models using a bi-encoder to encode each query or document into a dense vector, the recently proposed late-interaction multi-vector models (i.e., ColBERT and COIL) achieve state-of-the-art retrieval effectiveness by using all token embeddings to represent documents and queries and modeling their relevance with a sum-of-max operation. However, these fine-grained representations may cause unacceptable storage overhead for practical search systems. In this study, we systematically analyze the matching mechanism of these late-interaction models and show that the sum-of-max operation heavily relies on the co-occurrence signals and some important words in the document. Based on these findings, we then propose several simple document pruning methods to reduce the storage overhead and compare the effectiveness of different pruning methods on different late-interaction models. We also leverage query pruning methods to further reduce the retrieval latency. We conduct extensive experiments on both in-domain and out-domain datasets and show that some of the used pruning methods can significantly improve the efficiency of these late-interaction models without substantially hurting their retrieval effectiveness.

REFERENCES

  1. [1] Altin Soner, Baeza-Yates Ricardo, and Cambazoglu B. Barla. 2020. Pre-indexing pruning strategies. In International Symposium on String Processing and Information Retrieval. Springer, 177193.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Altingovde Ismail S., Ozcan Rifat, and Ulusoy Özgür. 2012. Static index pruning in web search engines: Combining term and document popularities with query views. ACM Transactions on Information Systems (TOIS) 30, 1 (2012), 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Baeza-Yates Ricardo, Gionis Aristides, Junqueira Flavio P., Murdock Vanessa, Plachouras Vassilis, and Silvestri Fabrizio. 2008. Design trade-offs for search engine caching. ACM Transactions on the Web (TWEB) 2, 4 (2008), 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Bai Yang, Li Xiaoguang, Wang Gang, Zhang Chaoliang, Shang Lifeng, Xu Jun, Wang Zhaowei, Wang Fangshan, and Liu Qun. 2020. SparTerm: Learning term-based sparse representation for fast text retrieval. arXiv preprint arXiv:2010.00768 (2020).Google ScholarGoogle Scholar
  5. [5] Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2018. MS MARCO: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2018).Google ScholarGoogle Scholar
  6. [6] Blanco Roi and Barreiro Alvaro. 2007. Static pruning of terms in inverted files. In Advances in Information Retrieval: 29th European Conference on IR Research, ECIR 2007, Rome, Italy, April 2-5, 2007. Proceedings 29. Springer, 6475.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Blanco Roi and Barreiro Alvaro. 2010. Probabilistic static pruning of inverted files. ACM Transactions on Information Systems (TOIS) 28, 1 (2010), 133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Büttcher Stefan and Clarke Charles L. A.. 2006. A document-centric approach to static index pruning in text retrieval systems. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 182189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Carmel David, Cohen Doron, Fagin Ronald, Farchi Eitan, Herscovici Michael, Maarek Yoelle S., and Soffer Aya. 2001. Static index pruning for information retrieval systems. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 4350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Dai Zhuyun and Callan Jamie. 2019. Context-aware sentence/passage term importance estimation for first stage retrieval. arXiv preprint arXiv:1910.10687 (2019).Google ScholarGoogle Scholar
  11. [11] Dai Zhuyun and Callan Jamie. 2020. Context-aware document term weighting for ad-hoc search. In Proceedings of The Web Conference 2020. 18971907.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  13. [13] Formal Thibault, Lassance Carlos, Piwowarski Benjamin, and Clinchant Stéphane. 2021. SPLADE v2: Sparse lexical and expansion model for information retrieval. arXiv preprint arXiv:2109.10086 (2021).Google ScholarGoogle Scholar
  14. [14] Formal Thibault, Piwowarski Benjamin, and Clinchant Stéphane. 2021. SPLADE: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 22882292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Formal Thibault, Piwowarski Benjamin, and Clinchant Stéphane. 2021. A white box analysis of ColBERT. In European Conference on Information Retrieval. Springer, 257263.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Furnas George W., Landauer Thomas K., Gomez Louis M., and Dumais Susan T.. 1987. The vocabulary problem in human-system communication. Commun. ACM 30, 11 (1987), 964971.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Gao Luyu, Dai Zhuyun, and Callan Jamie. 2021. COIL: Revisit exact lexical match in information retrieval with contextualized inverted list. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 30303042.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Hofstätter Sebastian, Khattab Omar, Althammer Sophia, Sertkan Mete, and Hanbury Allan. 2022. Introducing neural bag of whole-words with ColBERTer: Contextualized late interactions using enhanced reduction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 737747.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Huang Jui-Ting, Sharma Ashish, Sun Shuying, Xia Li, Zhang David, Pronin Philip, Padmanabhan Janani, Ottaviano Giuseppe, and Yang Linjun. 2020. Embedding-based retrieval in Facebook search. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 25532561.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Humeau Samuel, Shuster Kurt, Lachaux Marie-Anne, and Weston Jason. 2019. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  21. [21] Johnson Jeff, Douze Matthijs, and Jégou Hervé. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019).Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Karpukhin Vladimir, Oguz Barlas, Min Sewon, Lewis Patrick, Wu Ledell, Edunov Sergey, Chen Danqi, and Yih Wen-tau. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 67696781.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Khattab Omar and Zaharia Matei. 2020. ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 3948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Lan Zhenzhong, Chen Mingda, Goodman Sebastian, Gimpel Kevin, Sharma Piyush, and Soricut Radu. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  25. [25] Lassance Carlos, Lupart Simon, Dejean Hervé, Clinchant Stéphane, and Tonellotto Nicola. 2023. A static pruning study on sparse neural retrievers. arXiv preprint arXiv:2304.12702 (2023).Google ScholarGoogle Scholar
  26. [26] Lassance Carlos, Maachou Maroua, Park Joohee, and Clinchant Stéphane. 2021. A study on token pruning for ColBERT. arXiv preprint arXiv:2112.06540 (2021).Google ScholarGoogle Scholar
  27. [27] Lin Jimmy and Ma Xueguang. 2021. A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques. arXiv preprint arXiv:2106.14807 (2021).Google ScholarGoogle Scholar
  28. [28] Lin Jimmy, Ma Xueguang, Lin Sheng-Chieh, Yang Jheng-Hong, Pradeep Ronak, and Nogueira Rodrigo. 2021. Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 23562362.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google ScholarGoogle Scholar
  30. [30] Luan Yi, Eisenstein Jacob, Toutanova Kristina, and Collins Michael. 2021. Sparse, dense, and attentional representations for text retrieval. Transactions of the Association for Computational Linguistics 9 (2021), 329345.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Mallia Antonio, Khattab Omar, Suel Torsten, and Tonellotto Nicola. 2021. Learning passage impacts for inverted indexes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 17231727.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Nogueira Rodrigo and Cho Kyunghyun. 2019. Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019).Google ScholarGoogle Scholar
  33. [33] Nogueira Rodrigo, Lin Jimmy, and Epistemic AI. 2019. From doc2query to docTTTTTquery. Online preprint (2019).Google ScholarGoogle Scholar
  34. [34] Nogueira Rodrigo, Yang Wei, Lin Jimmy, and Cho Kyunghyun. 2019. Document expansion by query prediction. arXiv preprint arXiv:1904.08375 (2019).Google ScholarGoogle Scholar
  35. [35] Ponte Jay M. and Croft W. Bruce. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 275281.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Qian Yujie, Lee Jinhyuk, Duddu Sai Meher Karthik, Dai Zhuyun, Brahma Siddhartha, Naim Iftekhar, Lei Tao, and Zhao Vincent Y.. 2022. Multi-vector retrieval as sparse alignment. arXiv preprint arXiv:2211.01267 (2022).Google ScholarGoogle Scholar
  37. [37] Qiao Yifan, Xiong Chenyan, Liu Zhenghao, and Liu Zhiyuan. 2019. Understanding the behaviors of BERT in ranking. arXiv preprint arXiv:1904.07531 (2019).Google ScholarGoogle Scholar
  38. [38] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  39. [39] Raffel Colin, Shazeer Noam, Roberts Adam, Lee Katherine, Narang Sharan, Matena Michael, Zhou Yanqi, Li Wei, and Liu Peter J.. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).Google ScholarGoogle Scholar
  40. [40] Robertson Stephen, Zaragoza Hugo, et al. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333389.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Santhanam Keshav, Khattab Omar, Saad-Falcon Jon, Potts Christopher, and Zaharia Matei. 2022. ColBERTv2: Effective and efficient retrieval via lightweight late interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 37153734.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Schuirmann Donald J.. 1987. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics 15 (1987), 657680.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Thakur Nandan, Reimers Nils, Rücklé Andreas, Srivastava Abhishek, and Gurevych Iryna. 2021. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).Google ScholarGoogle Scholar
  44. [44] Thota Sree Lekha and Carterette Ben. 2011. Within-document term-based index pruning with statistical hypothesis testing. In Advances in Information Retrieval: 33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, April 18-21, 2011. Proceedings 33. Springer, 543554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Tonellotto Nicola and Macdonald Craig. 2021. Query embedding pruning for dense retrieval. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 34533457.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Xiong Lee, Xiong Chenyan, Li Ye, Tang Kwok-Fung, Liu Jialin, Bennett Paul N., Ahmed Junaid, and Overwijk Arnold. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  47. [47] Yang Zhilin, Dai Zihang, Yang Yiming, Carbonell Jaime, Salakhutdinov Russ R., and Le Quoc V.. 2019. XLNet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems 32 (2019).Google ScholarGoogle Scholar
  48. [48] Zhan Jingtao, Mao Jiaxin, Liu Yiqun, Guo Jiafeng, Zhang Min, and Ma Shaoping. 2021. Optimizing dense retrieval model training with hard negatives. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 15031512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Zhan Jingtao, Mao Jiaxin, Liu Yiqun, Zhang Min, and Ma Shaoping. 2020. An analysis of BERT in document ranking. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 19411944.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Zhan Jingtao, Mao Jiaxin, Liu Yiqun, Zhang Min, and Ma Shaoping. 2020. RepBERT: Contextualized text embeddings for first-stage retrieval. arXiv preprint arXiv:2006.15498 (2020).Google ScholarGoogle Scholar

Index Terms

  1. An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 42, Issue 5
      September 2024
      612 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/3618083
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 April 2024
      • Online AM: 31 January 2024
      • Accepted: 21 December 2023
      • Revised: 31 October 2023
      • Received: 15 March 2023
      Published in tois Volume 42, Issue 5

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)212
      • Downloads (Last 6 weeks)71

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text