Abstract
The learning objective plays a fundamental role to build a recommender system. Most methods routinely adopt either pointwise (e.g., binary cross-entropy) or pairwise (e.g., BPR) loss to train the model parameters, while rarely pay attention to softmax loss, which assumes the probabilities of all classes sum up to 1, due to its computational complexity when scaling up to large datasets or intractability for streaming data where the complete item space is not always available. The sampled softmax (SSM) loss emerges as an efficient substitute for softmax loss. Its special case, InfoNCE loss, has been widely used in self-supervised learning and exhibited remarkable performance for contrastive learning. Nonetheless, limited recommendation work uses the SSM loss as the learning objective. Worse still, none of them explores its properties thoroughly and answers “Does SSM loss suit for item recommendation?” and “What are the conceptual advantages of SSM loss, as compared with the prevalent losses?”, to the best of our knowledge.
In this work, we aim at offering a better understanding of SSM for item recommendation. Specifically, we first theoretically reveal three model-agnostic advantages: (1) mitigating popularity bias, which is beneficial to long-tail recommendation; (2) mining hard negative samples, which offers informative gradients to optimize model parameters; and (3) maximizing the ranking metric, which facilitates top-K performance. However, based on our empirical studies, we recognize that the default choice of cosine similarity function in SSM limits its ability in learning the magnitudes of representation vectors. As such, the combinations of SSM with the models that also fall short in adjusting magnitudes (e.g., matrix factorization) may result in poor representations. One step further, we provide mathematical proof that message passing schemes in graph convolution networks can adjust representation magnitude according to node degree, which naturally compensates for the shortcoming of SSM. Extensive experiments on four benchmark datasets justify our analyses, demonstrating the superiority of SSM for item recommendation. Our implementations are available in both TensorFlow1 and PyTorch.2
- [1] Yu Bai, Sally Goldman, and Li Zhang. 2017. TAPAS: Two-pass approximate adaptive sampling for softmax.Google Scholar
- [2] . 2003. Quick training of probabilistic neural nets by importance sampling. In Proceedings of the International Workshop on Artificial Intelligence and Statistics.Google Scholar
- [3] . 2018. Adaptive sampled softmax with kernel based sampling. In Proceedings of the International Conference on Machine Learning. 589–598.Google Scholar
- [4] . 2019. An analysis of the softmax cross entropy loss for learning-to-rank with binary relevance. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. 75–78.Google ScholarDigital Library
- [5] . 2007. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning. 129–136.Google ScholarDigital Library
- [6] . 2023. Bias and debias in recommender system: A survey and future directions. ACM Transactions on Information Systems 41, 3 (2023), 67:1–67:39.Google ScholarDigital Library
- [7] . 2020. Revisiting graph based collaborative filtering: A linear residual graph convolutional network approach. In Proceedings of theAAAI Conference on Artificial Intelligence. AAAI Press, 27–34.Google Scholar
- [8] . 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning. 1597–1607.Google Scholar
- [9] . 2016. Deep neural networks for YouTube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. 191–198.Google ScholarDigital Library
- [10] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.Google Scholar
- [11] . 2021. SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 6894–6910.Google ScholarCross Ref
- [12] . 2018. Unsupervised representation learning by predicting image rotations. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [13] . 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249–256.Google Scholar
- [14] . 2019. Streaming session-based recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1569–1577.Google ScholarDigital Library
- [15] . 2012. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research 13, 11 (2012), 307–361.Google Scholar
- [16] . 2017. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 355–364.Google ScholarDigital Library
- [17] . 2020. LightGCN: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639–648.Google ScholarDigital Library
- [18] . 2018. NAIS: Neural attentive item similarity model for recommendation. IEEE Transactions on Knowledge and Data Engineering 30, 12 (2018), 2354–2366.Google ScholarDigital Library
- [19] . 2017. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web. 173–182.Google ScholarDigital Library
- [20] . 2016. Session-based recommendations with recurrent neural networks. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [21] . 2008. Collaborative filtering for implicit feedback datasets. In Proceeings of the 8th IEEE International Conference on Data Mining. 263–272.Google ScholarDigital Library
- [22] . 2013. FISM: Factored item similarity models for top-N recommender systems. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge discovery and Data Mining. 659–667.Google ScholarDigital Library
- [23] . 2020. Supervised contrastive learning. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
- [24] . 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [25] . 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations. (Poster).Google Scholar
- [26] . 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 426–434.Google ScholarDigital Library
- [27] . 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.Google ScholarDigital Library
- [28] . 2020. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [29] . 2020. Personalized ranking with importance sampling. In Proceedings of the Web Conference 2020. 1093–1103.Google ScholarDigital Library
- [30] . 2021. Contrastive learning for recommender system. arXiv:2101.01317. Retrieved from https://arxiv.org/abs/2101.01317Google Scholar
- [31] . 2021. SimpleX: A simple and strong baseline for collaborative filtering. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management. 1243–1252.Google ScholarDigital Library
- [32] . 2011. SLIM: Sparse linear methods for top-N recommender systems. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining. 497–506.Google ScholarDigital Library
- [33] . 1964. Cours d’économie politique. Vol. 1. Librairie Droz.Google ScholarCross Ref
- [34] . 2015. Preference completion: Large-scale collaborative ranking from pairwise comparisons. In Proceedings of the International Conference on Machine Learning. 1907–1916.Google Scholar
- [35] . 2019. Sampled softmax with random fourier features. In Proceedings of the Advances in Neural Information Processing Systems. 13834–13844.Google Scholar
- [36] . 2022. Item recommendation from implicit feedback. In Proceedings of the Recommender Systems Handbook. Springer US, 143–171.Google ScholarCross Ref
- [37] . 2014. Improving pairwise learning for item recommendation from implicit feedback. In Proceedings of the 7th ACM international Conference on Web Search and Data Mining. 273–282.Google ScholarDigital Library
- [38] . 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. 452–461.Google ScholarDigital Library
- [39] . 2005. Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd International Conference on Machine Learning. 713–719.Google ScholarDigital Library
- [40] . 2016. Recommendations as treatments: Debiasing learning and evaluation. In Proceedings of the International Conference on Machine Learning. 1670–1679.Google Scholar
- [41] . 2016. Deep crossing: Web-scale modeling without manually crafted combinatorial features. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 255–262.Google ScholarDigital Library
- [42] . 2023. Multisample-based contrastive loss for top-K recommendation. IEEE Transactions on Multimedia 25 (2023), 339–351.Google ScholarCross Ref
- [43] . 2018. Representation learning with contrastive predictive coding. arXiv:1807.03748. Retrieved from http://arxiv.org/abs/1807.03748Google Scholar
- [44] . 2022. Cross pairwise ranking for unbiased item recommendation. In Proceedings of the ACM Web Conference 2022. 2370–2378.Google ScholarDigital Library
- [45] . 2021. Cross-batch negative sampling for training two-tower recommenders. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1632–1636.Google ScholarDigital Library
- [46] . 2019. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR Conference on Research and Development in Information Retrieval. 165–174.Google ScholarDigital Library
- [47] . 2011. WSABIE: Scaling up to large vocabulary image annotation. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence. 2764–2770.Google Scholar
- [48] . 2023. Understanding contrastive learning via distributionally robust optimization. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
- [49] . 2022. Graph convolution machine for context-aware recommender system. Frontiers of Computer Science 16, 6 (2022), 166614.Google ScholarDigital Library
- [50] . 2021. Self-supervised graph learning for recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 726–735.Google ScholarDigital Library
- [51] . 2023. GIF: A general graph unlearning strategy via influence function. In Proceedings of the ACM Web Conference 2023. 651–661.Google ScholarDigital Library
- [52] . 2008. Listwise approach to learning to rank: Theory and algorithm. In Proceedings of the 25th International Conference on Machine Learning. 1192–1199.Google ScholarDigital Library
- [53] . 2020. Mixed negative sampling for learning two-tower neural networks in recommendations. In Companion of The Web Conference. 441–447.Google ScholarDigital Library
- [54] . 2023. A generic learning framework for sequential recommendation with distribution shifts. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 331–340.Google ScholarDigital Library
- [55] . 2023. Generate what you prefer: Reshaping sequential recommendation via guided diffusion. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
- [56] . 2021. Self-supervised learning for large-scale item recommendations. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management. 4321–4330.Google ScholarDigital Library
- [57] . 2019. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems. 269–277.Google ScholarDigital Library
- [58] . 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 974–983.Google ScholarDigital Library
- [59] . 2014. NOMAD: Non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. In Proceedings of the VLDB Endowment. 7, 11 (2014), 975–986.Google Scholar
- [60] . 2022. Incorporating bias-aware margins into contrastive loss for collaborative filtering. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
- [61] . 2013. Optimizing top-n collaborative filtering via dynamic negative item sampling. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 785–788.Google ScholarDigital Library
- [62] . 2021. Causal intervention for leveraging popularity bias in recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 11–20.Google ScholarDigital Library
- [63] . 2021. Contrastive learning for debiased candidate generation in large-scale recommender systems. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3985–3995.Google ScholarDigital Library
- [64] . 2020. S3-Rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 1893–1902.Google ScholarDigital Library
Index Terms
- On the Effectiveness of Sampled Softmax Loss for Item Recommendation
Recommendations
Improving Cold Start Recommendation by Mapping Feature-Based Preferences to Item Comparisons
UMAP '17: Proceedings of the 25th Conference on User Modeling, Adaptation and PersonalizationMany Recommender Systems (RSs) rely on user preference data in the form of ratings or likes for items. Previous research has shown that item comparisons can also be effectively used to model user preferences and build RS. However, users often express ...
Learning Item/User Vectors from Comments for Collaborative Recommendation
ICMLC '17: Proceedings of the 9th International Conference on Machine Learning and ComputingCollaborative Filtering (CF) has been widely used in many recommender systems over the past decades. Conventional CF-based methods mainly consider the ratings given to items via users and suffer from the sparsity and cold-start problems very much. ...
A Collaborative Filtering Recommendation Algorithm Based on Item Classification
PACCS '09: Proceedings of the 2009 Pacific-Asia Conference on Circuits, Communications and SystemsCollaborative filtering systems represent services of personalized that aim at predicting a user’s interest on some items available in the application systems. With the development of electronic commerce, the number of users and items grows rapidly, ...
Comments