当前位置: X-MOL 学术ACM Trans. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust Collaborative Filtering to Popularity Distribution Shift
ACM Transactions on Information Systems ( IF 5.6 ) Pub Date : 2024-01-22 , DOI: 10.1145/3627159
An Zhang 1 , Wenchang Ma 1 , Jingnan Zheng 1 , Xiang Wang 2 , Tat-Seng Chua 1
Affiliation  

In leading collaborative filtering (CF) models, representations of users and items are prone to learn popularity bias in the training data as shortcuts. The popularity shortcut tricks are good for in-distribution (ID) performance but poorly generalized to out-of-distribution (OOD) data, i.e., when popularity distribution of test data shifts w.r.t. the training one. To close the gap, debiasing strategies try to assess the shortcut degrees and mitigate them from the representations. However, there exist two deficiencies: (1) when measuring the shortcut degrees, most strategies only use statistical metrics on a single aspect (i.e., item frequency on item and user frequency on user aspect), failing to accommodate the compositional degree of a user–item pair; (2) when mitigating shortcuts, many strategies assume that the test distribution is known in advance. This results in low-quality debiased representations. Worse still, these strategies achieve OOD generalizability with a sacrifice on ID performance.

In this work, we present a simple yet effective debiasing strategy, PopGo, which quantifies and reduces the interaction-wise popularity shortcut without any assumptions on the test data. It first learns a shortcut model, which yields a shortcut degree of a user–item pair based on their popularity representations. Then, it trains the CF model by adjusting the predictions with the interaction-wise shortcut degrees. By taking both causal- and information-theoretical looks at PopGo, we can justify why it encourages the CF model to capture the critical popularity-agnostic features while leaving the spurious popularity-relevant patterns out. We use PopGo to debias two high-performing CF models (matrix factorization [28] and LightGCN [19]) on four benchmark datasets. On both ID and OOD test sets, PopGo achieves significant gains over the state-of-the-art debiasing strategies (e.g., DICE [71] and MACR [58]). Codes and datasets are available at https://github.com/anzhang314/PopGo.



中文翻译:

鲁棒的协同过滤来改变流行度分布

在领先的协同过滤(CF)模型中,用户和项目的表示很容易将训练数据中的流行度偏差学习为捷径。流行度捷径技巧对于分布内 (ID) 性能很有帮助,但对于分布外 (OOD) 数据的推广效果较差,即当测试数据的流行度分布相对于训练数据发生变化时。为了缩小差距,去偏策略尝试评估快捷程度并从表示中减轻它们。然而,存在两个不足:(1)在衡量快捷度时,大多数策略仅使用单一方面的统计指标(即项目上的项目频率和用户方面的用户频率),未能适应用户的组合程度– 物品对;(2) 在减少捷径时,许多策略假设测试分布是预先已知的。这会导致低质量的偏差表示。更糟糕的是,这些策略实现了 OOD 通用性,但牺牲了 ID 性能。

在这项工作中,我们提出了一种简单而有效的去偏策略PopGo,它可以在不对测试数据进行任何假设的情况下量化和减少交互流行度捷径。它首先学习一个快捷模型,该模型根据用户-项目对的流行度表示生成快捷程度。然后,它通过使用交互快捷度调整预测来训练 CF 模型。通过对 PopGo 进行因果理论和信息理论研究,我们可以证明为什么它鼓励 CF 模型捕捉与流行度无关的关键特征,同时将虚假的流行度相关模式排除在外。我们使用 PopGo 在四个基准数据集上对两个高性能 CF 模型(矩阵分解 [28] 和 LightGCN [19])进行去偏。在 ID 和 OOD 测试集上,PopGo 比最先进的去偏策略(例如 DICE [71] 和 MACR [58])取得了显着的进步。代码和数据集可在 https://github.com/anzhang314/PopGo 获取。

更新日期:2024-01-22
down
wechat
bug