Improving Out-of-Vocabulary Handling in Recommendation Systems,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving Out-of-Vocabulary Handling in Recommendation Systems
arXiv - CS - Information Retrieval Pub Date : 2024-03-27 , DOI: arxiv-2403.18280
William Shiao, Mingxuan Ju, Zhichun Guo, Xin Chen, Evangelos Papalexakis, Tong Zhao, Neil Shah, Yozen Liu

Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommending new users and items unseen (out-of-vocabulary, or OOV) at training time. This setting is known as the inductive setting and is especially problematic for factorization-based models, which rely on encoding only those users/items seen at training time with fixed parameter vectors. Many existing solutions applied in practice are often naive, such as assigning OOV users/items to random buckets. In this work, we tackle this problem and propose approaches that better leverage available user/item features to improve OOV handling at the embedding table level. We discuss general-purpose plug-and-play approaches that are easily applicable to most RS models and improve inductive performance without negatively impacting transductive model performance. We extensively evaluate 9 OOV embedding methods on 5 models across 4 datasets (spanning different domains). One of these datasets is a proprietary production dataset from a prominent RS employed by a large social platform serving hundreds of millions of daily active users. In our experiments, we find that several proposed methods that exploit feature similarity using LSH consistently outperform alternatives on most model-dataset combinations, with the best method showing a mean improvement of 3.74% over the industry standard baseline in inductive performance. We release our code and hope our work helps practitioners make more informed decisions when handling OOV for their RS and further inspires academic research into improving OOV support in RS.

中文翻译：

改进推荐系统中的词汇外处理

鉴于推荐系统 (RS) 对数十亿用户的日常在线体验产生广泛影响，它对于学术和行业研究人员来说是一个越来越重要的领域。真实 RS 中的一个常见问题是冷启动问题，即用户和项目可能没有包含足够的信息来生成高质量的推荐。这项工作侧重于一个补充问题：推荐训练时未见过的新用户和项目（词汇外或 OOV）。此设置称为归纳设置，对于基于分解的模型尤其有问题，该模型仅依赖于使用固定参数向量对训练时看到的那些用户/项目进行编码。实践中应用的许多现有解决方案通常很幼稚，例如将 OOV 用户/项目分配到随机桶中。在这项工作中，我们解决了这个问题，并提出了更好地利用可用的用户/项目特征来改进嵌入表级别的 OOV 处理的方法。我们讨论通用的即插即用方法，这些方法可以轻松适用于大多数 RS 模型，并提高归纳性能，而不会对传导模型性能产生负面影响。我们在 4 个数据集（跨越不同领域）的 5 个模型上广泛评估了 9 种 OOV 嵌入方法。其中一个数据集是来自一个著名 RS 的专有生产数据集，该 RS 由一个为数亿日常活跃用户提供服务的大型社交平台使用。在我们的实验中，我们发现几种使用 LSH 来利用特征相似性的方法在大多数模型-数据集组合上始终优于替代方法，其中最好的方法在归纳性能方面比行业标准基线平均提高了 3.74%。我们发布了我们的代码，希望我们的工作可以帮助从业者在处理 RS 的 OOV 时做出更明智的决策，并进一步激发学术研究以改善 RS 中的 OOV 支持。

更新日期：2024-03-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>