当前位置: X-MOL 学术Front. Inform. Technol. Electron. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning
Frontiers of Information Technology & Electronic Engineering ( IF 3 ) Pub Date : 2023-12-27 , DOI: 10.1631/fitee.2300296
Yuxin Huang , Huailing Gu , Zhengtao Yu , Yumeng Gao , Tong Pan , Jialong Xu

Cross-lingual summarization (CLS) is the task of generating a summary in a target language from a document in a source language. Recently, end-to-end CLS models have achieved impressive results using large-scale, high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora. However, due to the limited performance of low-resource language translation models, translation noise can seriously degrade the performance of these models. In this paper, we propose a fine-grained reinforcement learning approach to address low-resource CLS based on noisy data. We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary. Specifically, we design a reinforcement reward by calculating the word correlation and word missing degree between the source language summary and the generated target language summary, and combine it with cross-entropy loss to optimize the CLS model. To validate the performance of our proposed model, we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets. Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.



中文翻译:

通过细粒度强化学习增强噪声数据的低资源跨语言摘要

跨语言摘要(CLS)是从源语言文档生成目标语言摘要的任务。最近,端到端 CLS 模型使用大规模、高质量的数据集取得了令人印象深刻的结果,这些数据集通常是通过将单语摘要语料库翻译成 CLS 语料库而构建的。然而,由于低资源语言翻译模型的性能有限,翻译噪声会严重降低这些模型的性能。在本文中,我们提出了一种细粒度强化学习方法来解决基于噪声数据的低资源 CLS。我们引入源语言摘要作为黄金信号,以减轻翻译后的嘈杂目标摘要的影响。具体来说,我们通过计算源语言摘要和生成的目标语言摘要之间的单词相关性和单词缺失程度来设计强化奖励,并将其与交叉熵损失相结合来优化 CLS 模型。为了验证我们提出的模型的性能,我们构建了汉语-越南语和越南语-汉语 CLS 数据集。实验结果表明,我们提出的模型在 ROUGE 分数和 BERTScore 方面都优于基线。

更新日期:2023-12-29
down
wechat
bug