当前位置: X-MOL 学术IEEE Trans. Games › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
WagerWin: An Efficient Reinforcement Learning Framework for Gambling Games
IEEE Transactions on Games ( IF 2.3 ) Pub Date : 2022-12-05 , DOI: 10.1109/tg.2022.3226526
Haoli Wang 1 , Hejun Wu 1 , Guoming Lai 2
Affiliation  

Although reinforcement learning (RL) has achieved great success in diverse scenarios, complex gambling games still pose great challenges for RL. Common deep RL methods have difficulties maintaining stability and efficiency in such games. By theoretical analysis, we find that the return distribution of a gambling game is an intrinsic factor of this problem. Such return distribution of gambling games is partitioned into two parts, depending on the win/lose outcome. These two parts represent the gain and loss. They repel each other because the player keeps “raising,” i.e., making a wager. However, common deep RL methods directly approximate the expectation of the return, without considering the particularity of the distribution. This way causes a redundant loss term in the objective function and a subsequent high variance. In this work, we propose WagerWin, a new framework for gambling games. WagerWin introduces probability and value factorization to construct a more effective value function. Our framework removes the redundant loss term of the objective function in training. In addition, WagerWin supports customized policy adaptation, which can tune the pretrained policy for different inclinations. We conduct extensive experiments on DouDizhu and SmallDou, a reduced version of DouDizhu . The results demonstrate that WagerWin outperforms the original state-of-the-art RL model in both training efficiency and stability.

中文翻译:

WagerWin:赌博游戏的高效强化学习框架

尽管强化学习(RL)在多种场景中取得了巨大成功,但复杂的赌博游戏仍然对强化学习提出了巨大的挑战。常见的深度强化学习方法很难在此类游戏中保持稳定性和效率。通过理论分析,我们发现赌博游戏的收益分布是该问题的内在因素。赌博游戏的这种回报分配根据输赢结果分为两部分。这两部分代表了收益和损失。它们相互排斥,因为玩家不断“加注”,即下注。然而,常见的深度强化学习方法直接近似回报的期望,没有考虑分布的特殊性。这种方式会导致目标函数中存在冗余损失项以及随后的高方差。在这项工作中,我们提出了 WagerWin,一个新的赌博游戏框架。WagerWin 引入概率和价值分解来构造更有效的价值函数。我们的框架消除了训练中目标函数的冗余损失项。此外,WagerWin支持定制策略适配,可以根据不同的倾向调整预训练策略。我们进行了广泛的实验斗地主和小斗,缩小版斗地主。结果表明,WagerWin 在训练效率和稳定性方面均优于原始最先进的 RL 模型。
更新日期:2022-12-05
down
wechat
bug