A hybrid similarity model for mitigating the cold-start problem of collaborative filtering in sparse data,Expert Systems with Applications

当前位置： X-MOL 学术 › Expert Syst. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A hybrid similarity model for mitigating the cold-start problem of collaborative filtering in sparse data
Expert Systems with Applications ( IF 8.5 ) Pub Date : 2024-03-20 , DOI: 10.1016/j.eswa.2024.123700
Jiewen Guan , Bilian Chen , Shenbao Yu

Similarity is a vital component for neighborhood-based collaborative filtering (CF). To improve the quality of recommendation, many similarity methods have been proposed and analyzed in recent decades. However, nearly all traditional similarity methods and many advanced similarity methods only utilize corated items among users to compute their similarity, which provides limited information in cold-start/sparse scenarios and yields misleading results. In addition, although a few advanced hybrid similarity models consider items beyond corated items, which can partly mitigate the above limitation, they still have drawbacks, such as disregarding penalizing noncorated items that have many disadvantages. In this paper, we explore a new robust hybrid similarity model, namely Wasserstein distance-based CF (WCF) model, for mitigating the cold-start problem of CF in sparse data. Specifically, we measure item similarity via the Wasserstein distance, which can help circumvent the drawbacks in the Bhattacharyya coefficient and KL divergence that are used in the literature, and is thus more robust in a cold-start/sparse scenario. Besides, we further design a new multiplicative user similarity formula which identifies all noncorated items as a whole to prioritize the importance of corated items and impair the negative effects of noncorated items, which will also play an important role in a cold-start/sparse scenario. In addition, we also propose two novel heuristic similarity factors to impair the negative effects of popular users and items as supplements. We conduct extensive experiments on five real-world benchmark recommendation datasets to test WCF. The experimental results show the superiority of WCF over other existing similarity methods in cold-start/sparse scenarios.

中文翻译：

用于缓解稀疏数据中协同过滤冷启动问题的混合相似度模型

相似性是基于邻域的协同过滤（CF）的重要组成部分。为了提高推荐质量，近几十年来人们提出并分析了许多相似度方法。然而，几乎所有传统的相似性方法和许多先进的相似性方法都仅利用用户之间的关联项来计算其相似性，这在冷启动/稀疏场景中提供的信息有限并产生误导性结果。此外，尽管一些先进的混合相似性模型考虑了除corated items之外的项目，这可以部分缓解上述限制，但它们仍然存在缺点，例如忽略对具有许多缺点的noncorated items的惩罚。在本文中，我们探索了一种新的鲁棒混合相似性模型，即 Wasserstein 基于距离的 CF（WCF）模型，用于缓解稀疏数据中 CF 的冷启动问题。具体来说，我们通过 Wasserstein 距离测量项目相似度，这可以帮助规避文献中使用的 Bhattacharyya 系数和 KL 散度的缺点，因此在冷启动/稀疏场景中更加稳健。此外，我们还进一步设计了一种新的乘法用户相似度公式，将所有非装饰项目识别为一个整体，以优先考虑装饰项目的重要性并削弱非装饰项目的负面影响，这也将在冷启动/稀疏场景中发挥重要作用。此外，我们还提出了两种新颖的启发式相似因素来削弱流行用户和项目作为补充的负面影响。我们对五个真实世界的基准推荐数据集进行了广泛的实验来测试 WCF。实验结果表明，在冷启动/稀疏场景下，WCF 相对于其他现有相似性方法具有优越性。

更新日期：2024-03-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>