Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks,Genetic Programming and Evolvable Machines

当前位置： X-MOL 学术 › Genet. Program. Evolvable Mach. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks
Genetic Programming and Evolvable Machines ( IF 2.6 ) Pub Date : 2022-11-30 , DOI: 10.1007/s10710-022-09446-8
Mihyar Al Masalma , Malcolm Heywood

Reactive learning agents cannot solve partially observable sequential decision-making tasks as they are limited to defining outcomes purely in terms of the observable state. However, augmenting reactive agents with external memory might provide a path for addressing this limitation. In this work, external memory takes the form of a linked list data structure that programs have to learn how to use. We identify conditions under which additional recurrent connectivity from program output to input is necessary for state disambiguation. Benchmarking against recent results from the neural network literature on three scalable partially observable sequential decision-making tasks demonstrates that the proposed approach scales much more effectively. Indeed, solutions are shown to generalize to far more difficult sequences than those experienced under training conditions. Moreover, recommendations are made regarding the instruction set and additional benchmarking is performed with input state values designed to explicitly disrupt the identification of useful states for later recall. The protected division operator appears to be particularly useful in developing simple solutions to all three tasks.

中文翻译：

在可扩展的部分可观察任务上使用链表外部存储器对集成遗传编程进行基准测试

反应式学习代理无法解决部分可观察的顺序决策任务，因为它们仅限于纯粹根据可观察状态来定义结果。但是，使用外部存储器增强反应式代理可能会提供解决此限制的途径。在这项工作中，外部存储器采用程序必须学习如何使用的链表数据结构的形式。我们确定了从程序输出到输入的额外循环连接对于状态消歧是必要的条件。对神经网络文献中关于三个可扩展的部分可观察顺序决策任务的最新结果进行基准测试表明，所提出的方法可以更有效地扩展。的确，解决方案被证明可以推广到比在训练条件下经历的更困难的序列。此外，还针对指令集提出了建议，并使用输入状态值执行了额外的基准测试，这些输入状态值旨在明确破坏有用状态的识别以供以后调用。受保护的除法运算符似乎在为所有三个任务开发简单的解决方案时特别有用。

更新日期：2022-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>