On the Effectiveness of Sampled Softmax Loss for Item Recommendation,ACM Transactions on Information Systems

当前位置： X-MOL 学术 › ACM Trans. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the Effectiveness of Sampled Softmax Loss for Item Recommendation
ACM Transactions on Information Systems ( IF 5.6 ) Pub Date : 2024-03-22 , DOI: 10.1145/3637061
Jiancan Wu ₁ , Xiang Wang ₁ , Xingyu Gao ₂ , Jiawei Chen ₃ , Hongcheng Fu ₄ , Tianyu Qiu ₄ , Xiangnan He ₁

Affiliation

The learning objective plays a fundamental role to build a recommender system. Most methods routinely adopt either pointwise (e.g., binary cross-entropy) or pairwise (e.g., BPR) loss to train the model parameters, while rarely pay attention to softmax loss, which assumes the probabilities of all classes sum up to 1, due to its computational complexity when scaling up to large datasets or intractability for streaming data where the complete item space is not always available. The sampled softmax (SSM) loss emerges as an efficient substitute for softmax loss. Its special case, InfoNCE loss, has been widely used in self-supervised learning and exhibited remarkable performance for contrastive learning. Nonetheless, limited recommendation work uses the SSM loss as the learning objective. Worse still, none of them explores its properties thoroughly and answers “Does SSM loss suit for item recommendation?” and “What are the conceptual advantages of SSM loss, as compared with the prevalent losses?”, to the best of our knowledge.

In this work, we aim at offering a better understanding of SSM for item recommendation. Specifically, we first theoretically reveal three model-agnostic advantages: (1) mitigating popularity bias, which is beneficial to long-tail recommendation; (2) mining hard negative samples, which offers informative gradients to optimize model parameters; and (3) maximizing the ranking metric, which facilitates top-K performance. However, based on our empirical studies, we recognize that the default choice of cosine similarity function in SSM limits its ability in learning the magnitudes of representation vectors. As such, the combinations of SSM with the models that also fall short in adjusting magnitudes (e.g., matrix factorization) may result in poor representations. One step further, we provide mathematical proof that message passing schemes in graph convolution networks can adjust representation magnitude according to node degree, which naturally compensates for the shortcoming of SSM. Extensive experiments on four benchmark datasets justify our analyses, demonstrating the superiority of SSM for item recommendation. Our implementations are available in both TensorFlow¹ and PyTorch.²

中文翻译：

关于采样 Softmax 损失对项目推荐的有效性

学习目标对于构建推荐系统起着基础作用。大多数方法通常采用逐点（例如，二元交叉熵）或成对（例如，BPR）损失来训练模型参数，而很少关注 softmax 损失，它假设所有类别的概率总和为 1，这是由于当扩展到大型数据集时，其计算复杂性或在完整的项目空间并不总是可用的情况下流数据的棘手性。采样softmax（SSM）损失作为softmax损失的有效替代品出现。其特例InfoNCE损失已广泛应用于自监督学习中，并在对比学习中表现出显着的性能。尽管如此，有限的推荐工作使用 SSM 损失作为学习目标。更糟糕的是，他们都没有彻底探索其属性并回答“SSM 损失是否适合项目推荐？”据我们所知，“与普遍的损失相比，SSM 损失的概念优势是什么？”。

在这项工作中，我们的目标是更好地理解用于项目推荐的 SSM。具体来说，我们首先从理论上揭示了三个与模型无关的优势：（1）减轻流行度偏差，这有利于长尾推荐；（2）挖掘硬负样本，为优化模型参数提供信息梯度； (3)最大化排名指标，这有利于top- K性能。然而，根据我们的实证研究，我们认识到 SSM 中余弦相似函数的默认选择限制了其学习表示向量大小的能力。因此，SSM 与在调整幅度（例如矩阵分解）方面也不足的模型的组合可能会导致较差的表示。更进一步，我们提供了数学证明，证明图卷积网络中的消息传递方案可以根据节点度调整表示大小，这自然弥补了 SSM 的缺点。对四个基准数据集的大量实验证明了我们的分析的合理性，证明了 SSM 在项目推荐方面的优越性。我们的实现可在 TensorFlow ¹和 PyTorch 中使用。²

更新日期：2024-03-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>