Improved learning efficiency of deep Monte-Carlo for complex imperfect-information card games,Applied Soft Computing

当前位置： X-MOL 学术 › Appl. Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improved learning efficiency of deep Monte-Carlo for complex imperfect-information card games
Applied Soft Computing ( IF 8.7 ) Pub Date : 2024-03-26 , DOI: 10.1016/j.asoc.2024.111545
Qian Luo , Tien-Ping Tan

Deep Reinforcement Learning (DRL) has achieved considerable success in games involving perfect and imperfect information, such as Go, Texas Hold’em, Stratego, and DouDiZhu. Nevertheless, training a state-of-the-art model for complex imperfect-information card games like DouDiZhu and Big2 remains resource and time-intensive. To address this challenge, this paper introduces two innovative methods: the Opponent Model and Optimized Deep Monte-Carlo (ODMC). These methods are designed to improve the training efficiency of Deep Monte-Carlo (DMC) for imperfect-information card games. The Opponent Model predicts hidden information, enhancing the agent’s learning speed in DMC compared to the original training that only utilizes observed information as input features. In ODMC, the Minimum Combination Search (MCS) is a heuristic search algorithm based on dynamic programming. It calculates the minimum combination of actions in the current state, and ODMC uses MCS to filter suboptimal actions in each state. This reduces the action space considered by DMC, resulting in faster training that focuses on evaluating the most promising actions. The effectiveness of the proposed approach is evaluated by examining two complex card games with imperfect information: DouDiZhu and Big2. Ablation experiments are conducted to evaluate both the Opponent Model (D+OM and B+OM) and ODMC (D+ODMC and B+ODMC), along with their combined variants (D+OMODMC and B+OMODMC). Furthermore, D+OMODMC and B+OMODMC are compared with state-of-the-art DouDiZhu and Big2 artificial intelligence (AI) programs, respectively. The experimental results demonstrate that the proposed methods achieve comparable performance to the original DMC, but with only 25.5% of the training time on the same device. These findings are valuable for mitigating both the equipment requirements and training time in complex imperfect-information card games.

中文翻译：

提高了复杂不完美信息纸牌游戏的深度蒙特卡罗学习效率

深度强化学习（DRL）在涉及完美和不完美信息的游戏中取得了相当大的成功，例如围棋、德州扑克、Stratego 和斗地主。尽管如此，为斗地主和 Big2 等复杂的不完美信息纸牌游戏训练最先进的模型仍然是资源和时间密集型的。为了应对这一挑战，本文引入了两种创新方法：对手模型和优化深度蒙特卡罗（ODMC）。这些方法旨在提高深度蒙特卡罗（DMC）对于不完美信息纸牌游戏的训练效率。对手模型预测隐藏信息，与仅利用观察到的信息作为输入特征的原始训练相比，提高了 DMC 中智能体的学习速度。在ODMC中，最小组合搜索（MCS）是一种基于动态规划的启发式搜索算法。它计算当前状态下的最小动作组合，ODMC使用MCS来过滤每个状态下的次优动作。这减少了 DMC 考虑的动作空间，从而加快了专注于评估最有希望的动作的训练速度。通过检查两种信息不完全的复杂纸牌游戏：DouDiZhu 和 Big2 来评估所提出方法的有效性。进行消融实验以评估对手模型（D+OM 和 B+OM）和 ODMC（D+ODMC 和 B+ODMC）及其组合变体（D+OMODMC 和 B+OMODMC）。此外，D+OMODMC 和 B+OMODMC 分别与最先进的 DouDiZhu 和 Big2 人工智能 (AI) 程序进行了比较。实验结果表明，所提出的方法实现了与原始 DMC 相当的性能，但在同一设备上的训练时间仅为 25.5%。这些发现对于减少复杂的不完美信息纸牌游戏的设备要求和训练时间非常有价值。

更新日期：2024-03-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>