DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment
arXiv - CS - Information Retrieval Pub Date : 2024-03-27 , DOI: arxiv-2403.18435
Haitao Li, Qingyao Ai, Xinyan Han, Jia Chen, Qian Dong, Yiqun Liu, Chong Chen, Qi Tian

Recent research demonstrates the effectiveness of using pre-trained language models for legal case retrieval. Most of the existing works focus on improving the representation ability for the contextualized embedding of the [CLS] token and calculate relevance using textual semantic similarity. However, in the legal domain, textual semantic similarity does not always imply that the cases are relevant enough. Instead, relevance in legal cases primarily depends on the similarity of key facts that impact the final judgment. Without proper treatments, the discriminative ability of learned representations could be limited since legal cases are lengthy and contain numerous non-key facts. To this end, we introduce DELTA, a discriminative model designed for legal case retrieval. The basic idea involves pinpointing key facts in legal cases and pulling the contextualized embedding of the [CLS] token closer to the key facts while pushing away from the non-key facts, which can warm up the case embedding space in an unsupervised manner. To be specific, this study brings the word alignment mechanism to the contextual masked auto-encoder. First, we leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Second, we employ the deep decoder to enable translation between different structures, with the goal of pinpointing key facts to enhance discriminative ability. Comprehensive experiments conducted on publicly available legal benchmarks show that our approach can outperform existing state-of-the-art methods in legal case retrieval. It provides a new perspective on the in-depth understanding and processing of legal case documents.

中文翻译：

DELTA：通过结构词对齐预训练用于法律案例检索的判别编码器

最近的研究证明了使用预先训练的语言模型进行法律案例检索的有效性。现有的大多数工作都集中在提高[CLS]标记的上下文嵌入的表示能力，并使用文本语义相似度来计算相关性。然而，在法律领域，文本语义相似并不总是意味着案件足够相关。相反，法律案件的相关性主要取决于影响最终判决的关键事实的相似性。如果没有适当的处理，学习表征的辨别能力可能会受到限制，因为法律案件冗长且包含大量非关键事实。为此，我们引入了 DELTA，一种专为法律案例检索而设计的判别模型。其基本思想包括查明法律案件中的关键事实，并将 [CLS] 代币的上下文嵌入拉近关键事实，同时远离非关键事实，这可以以无监督的方式预热案件嵌入空间。具体来说，本研究将单词对齐机制引入上下文屏蔽自动编码器中。首先，我们利用浅层解码器来创建信息瓶颈，旨在增强表示能力。其次，我们使用深度解码器来实现不同结构之间的翻译，目的是查明关键事实以增强判别能力。对公开可用的法律基准进行的综合实验表明，我们的方法在法律案例检索方面可以优于现有的最先进方法。为深入理解和处理法律案件文书提供了新的视角。

更新日期：2024-03-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>