当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems
arXiv - CS - Machine Learning Pub Date : 2024-03-26 , DOI: arxiv-2403.17634
Siyu Wang, Xiaocong Chen, Lina Yao

Reinforcement Learning-based Recommender Systems (RLRS) have shown promise across a spectrum of applications, from e-commerce platforms to streaming services. Yet, they grapple with challenges, notably in crafting reward functions and harnessing large pre-existing datasets within the RL framework. Recent advancements in offline RLRS provide a solution for how to address these two challenges. However, existing methods mainly rely on the transformer architecture, which, as sequence lengths increase, can introduce challenges associated with computational resources and training costs. Additionally, the prevalent methods employ fixed-length input trajectories, restricting their capacity to capture evolving user preferences. In this study, we introduce a new offline RLRS method to deal with the above problems. We reinterpret the RLRS challenge by modeling sequential decision-making as an inference task, leveraging adaptive masking configurations. This adaptive approach selectively masks input tokens, transforming the recommendation task into an inference challenge based on varying token subsets, thereby enhancing the agent's ability to infer across diverse trajectory lengths. Furthermore, we incorporate a multi-scale segmented retention mechanism that facilitates efficient modeling of long sequences, significantly enhancing computational efficiency. Our experimental analysis, conducted on both online simulator and offline datasets, clearly demonstrates the advantages of our proposed method.

中文翻译:

用于基于强化学习的推荐系统的具有自适应屏蔽的保持性决策转换器

基于强化学习的推荐系统 (RLRS) 在从电子商务平台到流媒体服务的一系列应用中都显示出了前景。然而,他们正在应对挑战,特别是在设计奖励函数和利用强化学习框架内的大型现有数据集方面。离线 RLRS 的最新进展为如何应对这两个挑战提供了解决方案。然而,现有方法主要依赖于变压器架构,随着序列长度的增加,可能会带来与计算资源和训练成本相关的挑战。此外,流行的方法采用固定长度的输入轨迹,限制了它们捕获不断变化的用户偏好的能力。在本研究中,我们引入了一种新的离线 RLRS 方法来处理上述问题。我们利用自适应屏蔽配置,将顺序决策建模为推理任务,重新解释 RLRS 挑战。这种自适应方法有选择地屏蔽输入令牌,将推荐任务转换为基于不同令牌子集的推理挑战,从而增强代理跨不同轨迹长度进行推理的能力。此外,我们采用了多尺度分段保留机制,有助于长序列的有效建模,显着提高计算效率。我们在在线模拟器和离线数据集上进行的实验分析清楚地证明了我们提出的方法的优势。
更新日期:2024-03-27
down
wechat
bug