Evolving the MCTS Upper Confidence Bounds for Trees Using a Semantic-Inspired Evolutionary Algorithm in the Game of Carcassonne,IEEE Transactions on Games

当前位置： X-MOL 学术 › IEEE Trans. Games › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evolving the MCTS Upper Confidence Bounds for Trees Using a Semantic-Inspired Evolutionary Algorithm in the Game of Carcassonne
IEEE Transactions on Games ( IF 2.3 ) Pub Date : 2022-08-31 , DOI: 10.1109/tg.2022.3203232
Edgar Galvan ₁ , Gavin Simpson ₁ , Fred Valdez Ameneyro ₁

Affiliation

Monte Carlo tree search (MCTS) is a sampling best-first method to search for optimal decisions. One popular selection mechanism that has proved to be reliable in MCTS is based on the Upper Confidence bounds for Trees (UCT). This attempts to balance exploration and exploitation. However, some tuning of the MCTS UCT is necessary for this to work well. In this work, we use Evolutionary Algorithms (EAs) to evolve mathematical expressions with the goal to substitute the UCT formula and use the evolved expressions in MCTS. Specifically, we evolve expressions using our proposed semantic-inspired evolutionary algorithm in MCTS (SIEA-MCTS). This is inspired by semantics in Genetic Programming (GP), where the use of fitness cases is seen as a requirement to be adopted in GP. Fitness cases are normally used to determine the fitness of individuals and can be used to compute the semantic similarity (or dissimilarity) of individuals. However, fitness cases are not available in MCTS. We extend this notion by using multiple reward values from MCTS that allow us to determine both the fitness values of individuals and their semantics. We show how SIEA-MCTS is able to successfully evolve expressions that yield better or competitive results compared to UCT. We compare the performance of the proposed SIEA-MCTS against MCTS algorithms, MCTS rapid action value estimation algorithms, three variants of the *-minimax family of algorithms, a random controller, and two more EA approaches. We consistently show how SIEA-MCTS outperforms most of these intelligent controllers in the game of Carcassonne .

中文翻译：

在卡尔卡松游戏中使用语义启发的进化算法进化树的 MCTS 上置信界

蒙特卡洛树搜索 (MCTS) 是一种搜索最佳决策的采样最佳优先方法。MCTS 中已证明可靠的一种流行选择机制是基于树的置信上限 (UCT)。这试图平衡探索和利用。然而，为了使其正常工作，需要对 MCTS UCT 进行一些调整。在这项工作中，我们使用进化算法（EA）来进化数学表达式，目标是替代 UCT 公式并在 MCTS 中使用进化后的表达式。具体来说，我们使用我们在 MCTS (SIEA-MCTS) 中提出的语义启发进化算法来进化表达式。这是受到遗传编程（GP）语义的启发，其中适应度案例的使用被视为 GP 中采用的要求。适应度案例通常用于确定个体的适应度，并可用于计算个体的语义相似性（或相异性）。然而，MCTS 中不提供健身案例。我们通过使用 MCTS 的多个奖励值来扩展这个概念，这使我们能够确定个体的适应度值及其语义。我们展示了 SIEA-MCTS 如何成功地进化出与 UCT 相比产生更好或有竞争力的结果的表达式。我们将所提出的 SIEA-MCTS 与 MCTS 算法、MCTS 快速动作值估计算法、*-minimax 系列算法的三种变体、随机控制器和另外两种 EA 方法的性能进行了比较。我们始终如一地展示 SIEA-MCTS 如何在游戏中胜过大多数智能控制器 MCTS 中不提供健身案例。我们通过使用 MCTS 的多个奖励值来扩展这个概念，这使我们能够确定个体的适应度值及其语义。我们展示了 SIEA-MCTS 如何成功地进化出与 UCT 相比产生更好或有竞争力的结果的表达式。我们将所提出的 SIEA-MCTS 与 MCTS 算法、MCTS 快速动作值估计算法、*-minimax 系列算法的三种变体、随机控制器和另外两种 EA 方法的性能进行了比较。我们始终如一地展示 SIEA-MCTS 如何在游戏中胜过大多数智能控制器 MCTS 中不提供健身案例。我们通过使用 MCTS 的多个奖励值来扩展这个概念，这使我们能够确定个体的适应度值及其语义。我们展示了 SIEA-MCTS 如何成功地进化出与 UCT 相比产生更好或有竞争力的结果的表达式。我们将所提出的 SIEA-MCTS 与 MCTS 算法、MCTS 快速动作值估计算法、*-minimax 系列算法的三种变体、随机控制器和另外两种 EA 方法的性能进行了比较。我们始终如一地展示 SIEA-MCTS 如何在游戏中胜过大多数智能控制器我们展示了 SIEA-MCTS 如何成功地进化出与 UCT 相比产生更好或有竞争力的结果的表达式。我们将所提出的 SIEA-MCTS 与 MCTS 算法、MCTS 快速动作值估计算法、*-minimax 系列算法的三种变体、随机控制器和另外两种 EA 方法的性能进行了比较。我们始终如一地展示 SIEA-MCTS 如何在游戏中胜过大多数智能控制器我们展示了 SIEA-MCTS 如何成功地进化出与 UCT 相比产生更好或有竞争力的结果的表达式。我们将所提出的 SIEA-MCTS 与 MCTS 算法、MCTS 快速动作值估计算法、*-minimax 系列算法的三种变体、随机控制器和另外两种 EA 方法的性能进行了比较。我们始终如一地展示 SIEA-MCTS 如何在游戏中胜过大多数智能控制器卡尔卡松。

更新日期：2022-08-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>