Abstract
Recommender systems are expected to be assistants that help human users find relevant information automatically without explicit queries. As recommender systems evolve, increasingly sophisticated learning techniques are applied and have achieved better performance in terms of user engagement metrics such as clicks and browsing time. The increase in the measured performance, however, can have two possible attributions: a better understanding of user preferences, and a more proactive ability to utilize human bounded rationality to seduce user over-consumption. A natural following question is whether current recommendation algorithms are manipulating user preferences. If so, can we measure the manipulation level? In this article, we present a general framework for benchmarking the degree of manipulations of recommendation algorithms, in both slate recommendation and sequential recommendation scenarios. The framework consists of four stages, initial preference calculation, training data collection, algorithm training and interaction, and metrics calculation that involves two proposed metrics, Manipulation Score and Preference Shift. We benchmark some representative recommendation algorithms in both synthetic and real-world datasets under the proposed framework. We have observed that a high online click-through rate does not necessarily mean a better understanding of user initial preference, but ends in prompting users to choose more documents they initially did not favor. Moreover, we find that the training data have notable impacts on the manipulation degrees, and algorithms with more powerful modeling abilities are more sensitive to such impacts. The experiments also verified the usefulness of the proposed metrics for measuring the degree of manipulations. We advocate that future recommendation algorithm studies should be treated as an optimization problem with constrained user preference manipulations.
- [1] . 2018. Effects of online recommendations on consumers’ willingness to pay. Information Systems Research 29, 1 (2018), 84–102.Google ScholarDigital Library
- [2] . 2018. Learning a deep listwise context model for ranking refinement. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 135–144.Google ScholarDigital Library
- [3] . 2010. Anchoring and cognitive ability. Economics Letters 107, 1 (2010), 66–68.Google ScholarCross Ref
- [4] Lucas Bernardi, Sakshi Batra, and Cintia Alicia Bruscantini. 2021. Simulations in recommender systems: An industry perspective. CoRR abs/2109.06723 (2021). arXiv:2109.06723 https://arxiv.org/abs/2109.06723Google Scholar
- [5] . 2019. Fairness in recommendation ranking through pairwise comparisons. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2212–2220.Google ScholarDigital Library
- [6] . 2018. Latent cross: Making use of context in recurrent recommender systems. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 46–54.Google ScholarDigital Library
- [7] . 2016. A neural click model for web search. In Proceedings of the 25th International Conference on World Wide Web. 531–541.Google ScholarDigital Library
- [8] . 2018. A click sequence model for web search. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 45–54.Google ScholarDigital Library
- [9] . 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. 89–96.Google ScholarDigital Library
- [10] . 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81.Google Scholar
- [11] . 1999. Behavioral economics: Reunifying psychology and economics. Proceedings of the National Academy of Sciences 96, 19 (1999), 10575–10577.Google ScholarCross Ref
- [12] Micah D. Carroll, Anca Dragan, Stuart Russell, and Dylan Hadfield-Menell. 2022. Estimating and penalizing induced preference shifts in recommender systems. In International Conference on Machine Learning. PMLR, 2686–2708.Google Scholar
- [13] . 2018. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. In Proceedings of the 12th ACM Conference on Recommender Systems. 224–232.Google ScholarDigital Library
- [14] . 2017. Confirmation bias with motivated beliefs. Games and Economic Behavior 104 (2017), 1–23. https://www.sciencedirect.com/science/article/abs/pii/S0899825617300416Google ScholarCross Ref
- [15] . 2019. TianGong-ST: A new dataset with large-scale refined real-world web search sessions. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2485–2488.Google ScholarDigital Library
- [16] . 2020. A context-aware click model for web search. In Proceedings of the 13th International Conference on Web Search and Data Mining. 88–96.Google ScholarDigital Library
- [17] . 2019. Behavior sequence transformer for e-commerce recommendation in alibaba. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. 1–4.Google ScholarDigital Library
- [18] . 2011. Recommending ephemeral items at web scale. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1013–1022.Google ScholarDigital Library
- [19] . 2019. How do product recommendations affect impulse buying? An empirical study on WeChat social commerce. Information and Management 56, 2 (2019), 236–248.Google ScholarDigital Library
- [20] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. 7–10.Google Scholar
- [21] Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, 103–111.Google Scholar
- [22] . 2014. Random regret minimization for consumer choice modeling: Assessment of empirical evidence. Journal of Business Research 67, 11 (2014), 2428–2436.Google ScholarCross Ref
- [23] . 2008. An experimental comparison of click position-bias models. In Proceedings of the 2008 International Conference on Web Search and Data Mining. 87–94.Google ScholarDigital Library
- [24] . 2021. An adversarial imitation click model for information retrieval. In Proceedings of the Web Conference 2021. 1809–1820.Google ScholarDigital Library
- [25] . 2007. Google news personalization: Scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web. 271–280.Google ScholarDigital Library
- [26] . 2018. All the cool kids, how do they fit in?: Popularity and demographic biases in recommender evaluation and effectiveness. In Proceedings of the Conference on Fairness, Accountability and Transparency. PMLR, 172–186.Google Scholar
- [27] Fenglei Fan, Jinjun Xiong, and Ge Wang. 2020. On Interpretability of artificial neural networks. CoRR abs/2001.02522 (2020). arXiv:2001.02522 http://arxiv.org/abs/2001.02522Google Scholar
- [28] . 2019. Off-policy deep reinforcement learning without exploration. In Proceedings of the International Conference on Machine Learning. PMLR, 2052–2062.Google Scholar
- [29] . 2011. A literature review of the anchoring effect. The Journal of Socio-economics 40, 1 (2011), 35–42.Google ScholarCross Ref
- [30] Diksha Garg, Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2020. Batch-constrained distributional reinforcement learning for session-based recommendation. CoRR abs/2012.08984 (2020). arXiv:2012.08984 https://arxiv.org/abs/2012.08984Google Scholar
- [31] Nada Ghanem, Stephan Leitner, and Dietmar Jannach. 2022. Balancing consumer and business value of recommender systems: A simulation-based analysis. Electronic Commerce Research and Applications 55 (2022), 101195.Google Scholar
- [32] . 2016. Modeling the decoy effect with context-RUM Models: Diagrammatic analysis and empirical evidence from route choice SP and mode choice RP case studies. Transportation Research Part B: Methodological 93 (2016), 318–337. https://www.sciencedirect.com/science/article/abs/pii/S0191261516301345Google ScholarCross Ref
- [33] . 2009. Click chain model in web search. In Proceedings of the 18th International Conference on World Wide Web. 11–20.Google ScholarDigital Library
- [34] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A factorization-machine based neural network for CTR prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. 1725–1731.Google Scholar
- [35] . 2016. Vista: A visually, socially, and temporally-aware model for artistic recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems. 309–316.Google ScholarDigital Library
- [36] . 2016. Fusing similarity models with markov chains for sparse sequential recommendation. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining. IEEE, 191–200.Google ScholarCross Ref
- [37] Xiangnan He, Xiaoyu Du, Xiang Wang, Feng Tian, Jinhui Tang, and Tat-Seng Chua. 2018. Outer product-based neural collaborative filtering. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 2227–2233.Google Scholar
- [38] . 2017. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web. 173–182.Google ScholarDigital Library
- [39] . 2018. Recurrent neural networks with top-k gains for session-based recommendations. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 843–852.Google ScholarDigital Library
- [40] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. In 4th International Conference on Learning Representations.Google Scholar
- [41] . 1982. Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of Consumer Research 9, 1 (1982), 90–98.Google ScholarCross Ref
- [42] Eugene Ie, Chih-Wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019. RecSim: A configurable simulation platform for recommender systems. CoRR abs/1909.04847 (2019). arXiv:1909.04847 http://arxiv.org/abs/1909.04847Google Scholar
- [43] Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, and Craig Boutilier. 2019. SlateQ: A tractable decomposition for reinforcement learning with recommendation sets. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 2592–2599.Google Scholar
- [44] . 2019. Degenerate feedback loops in recommender systems. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 383–390.Google ScholarDigital Library
- [45] . 2017. Neural survival recommender. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. 515–524.Google ScholarDigital Library
- [46] Michael I. Jordan. 2003. An introduction to probabilistic graphical models. MIT Press.Google Scholar
- [47] . 1995. Consumer choice in context: The decoy effect in travel and tourism. Journal of Travel Research 34, 1 (1995), 45–50.Google ScholarCross Ref
- [48] . 2018. Self-attentive sequential recommendation. In Proceedings of the 2018 IEEE International Conference on Data Mining. IEEE, 197–206.Google ScholarCross Ref
- [49] . 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 426–434.Google ScholarDigital Library
- [50] . 2008. User-generated content. IEEE Pervasive Computing 7, 4 (2008), 10–11.Google ScholarDigital Library
- [51] . 2006. Evaluating effectiveness and portability of reinforcement learned dialogue strategies with real users: The TALK TownInfo evaluation. In Proceedings of the 2006 IEEE Spoken Language Technology Workshop. IEEE, 178–181.Google ScholarCross Ref
- [52] . 2000. A stochastic model of human-machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing 8, 1 (2000), 11–23.Google ScholarCross Ref
- [53] Elisabeth Lex, Mario Wagner, and Dominik Kowald. 2018. Mitigating confirmation bias on twitter by recommending opposing views. CoRR abs/1809.03901 (2018). arXiv:1809.03901 http://arxiv.org/abs/1809.03901Google Scholar
- [54] . 2014. Modeling and broadening temporal user interest in personalized news recommendation. Expert Systems with Applications 41, 7 (2014), 3168–3177.Google ScholarDigital Library
- [55] . 2019. User-video co-attention network for personalized micro-video recommendation. In Proceedings of the World Wide Web Conference. 3020–3026.Google ScholarDigital Library
- [56] . 2020. Feedback loop and bias amplification in recommender systems. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 2145–2148.Google ScholarDigital Library
- [57] . 2020. Recommender systems and their ethical challenges. Ai and Society 35, 4 (2020), 957–967.Google ScholarDigital Library
- [58] . 2003. Consumer Response to Product Unavailability. The Ohio State University.Google Scholar
- [59] . 2020. Demonstrating principled uncertainty modeling for recommender ecosystems with RecSim NG. In Proceedings of the 14th ACM Conference on Recommender Systems. 591–593.Google ScholarDigital Library
- [60] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013). arXiv:1312.5602 http://arxiv.org/abs/1312.5602Google Scholar
- [61] . 2014. Exploring the filter bubble: The effect of using recommender systems on content diversity. In Proceedings of the 23rd International Conference on World Wide Web. 677–686.Google ScholarDigital Library
- [62] . 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology 2, 2 (1998), 175–220.Google ScholarCross Ref
- [63] Xiao Pan, Lei Wu, Fenjie Long, and Ma Ang. 2022. Exploiting user behavior learning for personalized trajectory recommendations. Frontiers of Computer Science 16, 3 (2022), 1–12.Google Scholar
- [64] . 2020. Setrank: Learning a permutation-invariant ranking model for information retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 499–508.Google ScholarDigital Library
- [65] . 2017. Towards an ethical recommendation framework. In Proceedings of the 2017 11th International Conference on Research Challenges in Information Science. IEEE, 211–220.Google ScholarCross Ref
- [66] Changhua Pei, Yi Zhang, Yongfeng Zhang, Fei Sun, Xiao Lin, Hanxiao Sun, Jian Wu, Peng Jiang, Junfeng Ge, Wenwu Ou, and Dan Pei. 2019. Personalized re-ranking for recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 3–11.Google Scholar
- [67] . 2006. A probabilistic framework for dialog simulation and optimal strategy learning. IEEE Transactions on Audio, Speech, and Language Processing 14, 2 (2006), 589–599.Google ScholarDigital Library
- [68] . 2020. User behavior retrieval for click-through rate prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2347–2356.Google ScholarDigital Library
- [69] Kan Ren, Jiarui Qin, Yuchen Fang, Weinan Zhang, Lei Zheng, Weijie Bian, Guorui Zhou, Jian Xu, Yong Yu, Xiaoqiang Zhu, and Kun Gai. 2019. Lifelong sequential modeling with personalized memorization for user response prediction. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 565–574.Google Scholar
- [70] . 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web. 811–820.Google ScholarDigital Library
- [71] . 2020. Auditing radicalization pathways on YouTube. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 131–141.Google ScholarDigital Library
- [72] David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. 2018. RecoGym: A reinforcement learning environment for the problem of product recommendation in online advertising. CoRR abs/1808.00720 (2018). arXiv:1808.00720 http://arxiv.org/abs/1808.00720Google Scholar
- [73] . 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web. 285–295.Google ScholarDigital Library
- [74] . 2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. The Knowledge Engineering Review 21, 2 (2006), 97–126.Google ScholarDigital Library
- [75] Paul Schoemaker. 1982. The expected utility model: Its variants, purposes, evidence and limitations. Journal of Economic Literature 20, 2 (1982), 529–563.Google Scholar
- [76] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347Google Scholar
- [77] . 1990. Bounded rationality. Journal of Institutional and Theoretical Economics (JITE)/Zeitschrift für die Gesamte Staatswissenschaft 146, 4 (1990), 649–658.Google Scholar
- [78] . 2019. Environment reconstruction with hidden confounders for reinforcement learning based recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 566–576.Google ScholarDigital Library
- [79] . 2019. Pyrecgym: A reinforcement learning gym for recommender systems. In Proceedings of the 13th ACM Conference on Recommender Systems. 491–495.Google ScholarDigital Library
- [80] . 2019. Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4902–4909.Google ScholarDigital Library
- [81] . 1999. Decoy effects and attribute-level inferences. Journal of Applied Psychology 84, 5 (1999), 823.Google ScholarCross Ref
- [82] . 2010. User browsing models: Relevance versus examination. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 223–232.Google ScholarDigital Library
- [83] . 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1441–1450.Google ScholarDigital Library
- [84] . 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 565–573.Google ScholarDigital Library
- [85] . 2011. Decoy effects in financial service e-sales systems. In Proceedings of the Workshop Decisions@ RecSys, in Conjunction with the Fourth ACM Conference on Recommender Systems. Citeseer, 1–8.Google Scholar
- [86] . 2020. Estimating error and bias in offline evaluation results. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. 392–396.Google ScholarDigital Library
- [87] . 2018. DKN: Deep knowledge-aware network for news recommendation. In Proceedings of the 2018 World Wide Web Conference. 1835–1844.Google ScholarDigital Library
- [88] Kai Wang, Zhene Zou, Minghao Zhao, Qilin Deng, Yue Shang, Yile Liang, Runze Wu, Xudong Shen, Tangjie Lyu, and Changjie Fan. 2023. RL4RS: A real-world dataset for reinforcement learning based recommender system. In Proceedings of the 46th InternationalACM SIGIR Conference on Research and Development in Information Retrieval. 2935–2944.Google Scholar
- [89] . 2021. User bias in beyond-accuracy measurement of recommendation algorithms. In Proceedings of the 15th ACM Conference on Recommender Systems. 133–142.Google ScholarDigital Library
- [90] . 2018. The lambdaloss framework for ranking metric optimization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1313–1322.Google ScholarDigital Library
- [91] Yifan Wang, Weizhi Ma, Min Zhang, Yiqun Liu, and Shaoping Ma. 2023. A Survey on the Fairness of Recommender Systems. ACM Transactions on Information Systems 41, 3 (2023), 1–43.Google Scholar
- [92] . 2010. A similarity measure for indefinite rankings. ACM Transactions on Information Systems 28, 4 (2010), 1–38.Google ScholarDigital Library
- [93] . 2008. Listwise approach to learning to rank: Theory and algorithm. In Proceedings of the 25th International Conference on Machine Learning. 1192–1199.Google ScholarDigital Library
- [94] . 2021. A general offline reinforcement learning framework for interactive recommendation. In Proceedings of the 35th AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- [95] Sirui Yao, Yoni Halpern, Nithum Thain, Xuezhi Wang, Kang Lee, Flavien Prost, Ed H. Chi, Jilin Chen, and Alex Beutel. 2021. Measuring recommender system effects with simulated users. CoRR abs/2101.04526 (2021). arXiv:2101.04526 https://arxiv.org/abs/2101.04526Google Scholar
- [96] . 2016. Lambdafm: Learning optimal ranking with factorization machines using lambda surrogates. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 227–236.Google ScholarDigital Library
- [97] Yu Zhang, Peter Tiňo, Aleš Leonardis, and Ke Tang. 2021. A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence 5, 5 (2021), 726–742.Google Scholar
- [98] . 2019. Recommending what video to watch next: A multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems. 43–51.Google ScholarDigital Library
- [99] . 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence. 5941–5948.Google ScholarDigital Library
- [100] . 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1059–1068.Google ScholarDigital Library
- [101] . 2021. Popularity-opportunity bias in collaborative filtering. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 85–93.Google ScholarDigital Library
- [102] . 2020. Measuring and mitigating item under-recommendation bias in personalized ranking systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 449–458.Google ScholarDigital Library
Index Terms
- Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems
Recommendations
Consideration about Applicability of Recommender System Employing Personal-Value-Based User Model
TAAI '13: Proceedings of the 2013 Conference on Technologies and Applications of Artificial IntelligenceThis paper presents consideration about applicability of recommender system based on personal-value-based user model. Existing methods such as collaborative and content-based approaches tend to be less-accurate for new users and items owing to the lack ...
Improving Accuracy of Recommender System by Item Clustering
Recommender System (RS) predicts user's ratings towards items, and then recommends highly-predicted items to user. In recent years, RS has been playing more and more important role in the agent research field. There have been a great deal of researches ...
Consideration about Applicability of Recommender System Employing Personal-Value-Based User Model
TAAI '13: Proceedings of the 2013 Conference on Technologies and Applications of Artificial IntelligenceThis paper presents consideration about applicability of recommender system based on personal-value-based user model. Existing methods such as collaborative and content-based approaches tend to be less-accurate for new users and items owing to the lack ...
Comments