MIND Your Language: A Multilingual Dataset for Cross-lingual News Recommendation,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MIND Your Language: A Multilingual Dataset for Cross-lingual News Recommendation
arXiv - CS - Information Retrieval Pub Date : 2024-03-26 , DOI: arxiv-2403.17876
Andreea Iana, Goran Glavaš, Heiko Paulheim

Digital news platforms use news recommenders as the main instrument to cater to the individual information needs of readers. Despite an increasingly language-diverse online community, in which many Internet users consume news in multiple languages, the majority of news recommendation focuses on major, resource-rich languages, and English in particular. Moreover, nearly all news recommendation efforts assume monolingual news consumption, whereas more and more users tend to consume information in at least two languages. Accordingly, the existing body of work on news recommendation suffers from a lack of publicly available multilingual benchmarks that would catalyze development of news recommenders effective in multilingual settings and for low-resource languages. Aiming to fill this gap, we introduce xMIND, an open, multilingual news recommendation dataset derived from the English MIND dataset using machine translation, covering a set of 14 linguistically and geographically diverse languages, with digital footprints of varying sizes. Using xMIND, we systematically benchmark several state-of-the-art content-based neural news recommenders (NNRs) in both zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer scenarios, considering both monolingual and bilingual news consumption patterns. Our findings reveal that (i) current NNRs, even when based on a multilingual language model, suffer from substantial performance losses under ZS-XLT and that (ii) inclusion of target-language data in FS-XLT training has limited benefits, particularly when combined with a bilingual news consumption. Our findings thus warrant a broader research effort in multilingual and cross-lingual news recommendation. The xMIND dataset is available at https://github.com/andreeaiana/xMIND.

中文翻译：

MIND Your Language：跨语言新闻推荐的多语言数据集

数字新闻平台以新闻推荐为主要工具，满足读者的个性化信息需求。尽管在线社区的语言日益多样化，许多互联网用户使用多种语言阅读新闻，但大多数新闻推荐都集中在资源丰富的主要语言，尤其是英语。此外，几乎所有新闻推荐工作都假设单语新闻消费，而越来越多的用户倾向于消费至少两种语言的信息。因此，现有的新闻推荐工作缺乏公开的多语言基准，而这些基准可以促进在多语言环境和资源匮乏的语言中有效的新闻推荐器的开发。为了填补这一空白，我们引入了 xMIND，这是一个开放的多语言新闻推荐数据集，使用机器翻译从英语 MIND 数据集衍生而来，涵盖 14 种语言和地理上不同的语言，具有不同大小的数字足迹。使用 xMIND，我们在零样本 (ZS-XLT) 和少样本 (FS-XLT) 跨语言传输场景中系统地对几种最先进的基于内容的神经新闻推荐器 (NNR) 进行了基准测试，同时考虑了这两种情况单语和双语新闻消费模式。我们的研究结果表明，(i) 当前的 NNR，即使基于多语言语言模型，在 ZS-XLT 下也会遭受严重的性能损失，并且 (ii) 在 FS-XLT 训练中包含目标语言数据的好处有限，特别是当结合双语新闻消费。因此，我们的研究结果需要在多语言和跨语言新闻推荐方面进行更广泛的研究工作。 xMIND 数据集可从 https://github.com/andreeaiana/xMIND 获取。

更新日期：2024-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>