当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving extractive summarization with semantic enhancement through topic-injection based BERT model
Information Processing & Management ( IF 8.6 ) Pub Date : 2024-02-08 , DOI: 10.1016/j.ipm.2024.103677
Yiming Wang , Jindong Zhang , Zhiyao Yang , Bing Wang , Jingyi Jin , Yitong Liu

In the field of text summarization, extractive techniques aim to extract key sentences from a document to form a summary. However, traditional methods are not sensitive enough to obtain the core semantics of the text, resulting in summaries that contain complicate comprehension. Recently, topic extraction technology extracts core semantics from text, enabling accurate summaries of the main points of a document. In this paper, we introduce the Topic-Injected Bidirectional Encoder Representations from Transformers (TP-BERT), a novel neural auto-encoder model designed explicitly for extractive summarization. TP-BERT integrates document-related topic words into sentences, improving contextual understanding and more accurately aligning summaries with a document’s main theme, addressing a key shortfall in traditional extractive methods. Another major innovation of TP-BERT is the use of contrastive learning during training. This method enhances summarization efficiency by giving prominence to key sentences and minimizing peripheral information. Additionally, we conducted ablation studies and parameter studies of TP-BERT conducted on the CNN/DailyMail, WikiHow, and XSum datasets. In our two main experiments, the average ROUGE-F1 score improved by 2.69 and 0.45 across the three datasets. In comparison to baseline methods, TP-BERT has demonstrated better performance based on the increase in ROUGE-F1 scores on three datasets. Moreover, the semantic differentiation between sentence representations has also contributed positively to the performance enhancements.

中文翻译:

通过基于主题注入的 BERT 模型通过语义增强改进提取摘要

在文本摘要领域,提取技术旨在从文档中提取关键句子以形成摘要。然而,传统方法不够敏感,无法获取文本的核心语义,导致摘要包含复杂的理解。最近,主题提取技术从文本中提取核心语义,从而能够准确概括文档的要点。在本文中,我们介绍了来自 Transformers 的主题注入双向编码器表示(TP-BERT),这是一种专门为提取摘要而设计的新型神经自动编码器模型。TP-BERT 将与文档相关的主题词集成到句子中,提高上下文理解并更准确地将摘要与文档的主题对齐,解决了传统提取方法的关键缺陷。TP-BERT的另一项重大创新是在训练过程中使用对比学习。该方法通过突出关键句子并最小化外围信息来提高摘要效率。此外,我们还对 CNN/DailyMail、WikiHow 和 XSum 数据集进行了 TP-BERT 的消融研究和参数研究。在我们的两个主要实验中,三个数据集的平均 ROUGE-F1 分数提高了 2.69 和 0.45。与基线方法相比,基于三个数据集上 ROUGE-F1 分数的增加,TP-BERT 表现出了更好的性能。此外,句子表示之间的语义区分也对性能的提高做出了积极贡献。
更新日期:2024-02-08
down
wechat
bug