当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement Learning-based Recommender Systems with Large Language Models for State Reward and Action Modeling
arXiv - CS - Information Retrieval Pub Date : 2024-03-25 , DOI: arxiv-2403.16948
Jie Wang, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose

Reinforcement Learning (RL)-based recommender systems have demonstrated promising performance in meeting user expectations by learning to make accurate next-item recommendations from historical user-item interactions. However, existing offline RL-based sequential recommendation methods face the challenge of obtaining effective user feedback from the environment. Effectively modeling the user state and shaping an appropriate reward for recommendation remains a challenge. In this paper, we leverage language understanding capabilities and adapt large language models (LLMs) as an environment (LE) to enhance RL-based recommenders. The LE is learned from a subset of user-item interaction data, thus reducing the need for large training data, and can synthesise user feedback for offline data by: (i) acting as a state model that produces high quality states that enrich the user representation, and (ii) functioning as a reward model to accurately capture nuanced user preferences on actions. Moreover, the LE allows to generate positive actions that augment the limited offline training data. We propose a LE Augmentation (LEA) method to further improve recommendation performance by optimising jointly the supervised component and the RL policy, using the augmented actions and historical user signals. We use LEA, the state and reward models in conjunction with state-of-the-art RL recommenders and report experimental results on two publicly available datasets.

中文翻译:

基于强化学习的推荐系统,具有用于状态奖励和动作建模的大型语言模型

基于强化学习 (RL) 的推荐系统通过学习从历史用户-项目交互中做出准确的下一个项目推荐,在满足用户期望方面表现出了良好的性能。然而,现有的基于强化学习的离线顺序推荐方法面临着从环境中获取有效用户反馈的挑战。有效地对用户状态进行建模并为推荐制定适当的奖励仍然是一个挑战。在本文中,我们利用语言理解能力并采用大型语言模型(LLM)作为环境(LE)来增强基于强化学习的推荐系统。 LE 是从用户-项目交互数据的子集学习的,从而减少了对大量训练数据的需求,并且可以通过以下方式合成离线数据的用户反馈:(i) 充当状态模型,产生丰富用户的高质量状态表示,以及(ii)作为奖励模型来准确捕获用户对操作的细微差别偏好。此外,LE 允许生成积极的行动来增强有限的离线训练数据。我们提出了一种 LE 增强(LEA)方法,通过使用增强动作和历史用户信号联合优化监督组件和 RL 策略,进一步提高推荐性能。我们将 LEA、状态和奖励模型与最先进的 RL 推荐器结合使用,并报告两个公开数据集的实验结果。
更新日期:2024-03-27
down
wechat
bug