‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma,The Knowledge Engineering Review

当前位置： X-MOL 学术 › Knowl. Eng. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma
The Knowledge Engineering Review ( IF 2.1 ) Pub Date : 2024-03-26 , DOI: 10.1017/s0269888924000018
Grace Feehan , Shaheen Fatima

Emerging reinforcement learning algorithms that utilize human traits as part of their conceptual architecture have been demonstrated to encourage cooperation in social dilemmas when compared to their unaltered origins. In particular, the addition of a mood mechanism facilitates more cooperative behaviour in multi-agent iterated prisoner dilemma (IPD) games, for both static and dynamic network contexts. Mood-altered agents also exhibit humanlike behavioural trends when environmental aspects of the dilemma are altered, such as the structure of the payoff matrix used. It is possible that other environmental effects from both human and agent-based research will interact with moody structures in previously unstudied ways. As the literature on these interactions is currently small, we seek to expand on previous research by introducing two more environmental dimensions; voluntary interaction in dynamic networks, and stability of interaction through varied network restructuring. From an initial Erdos–Renyi random network, we manipulate the structure of a network IPD according to existing methodology in human-based research, to investigate possible replication of their findings. We also facilitated strategic selection of opponents through the introduction of two partner evaluation mechanisms and tested two selection thresholds for each. We found that even minimally strategic play termination in dynamic networks is enough to enhance cooperation above a static level, though the thresholds for these strategic decisions are critical to desired outcomes. More forgiving thresholds lead to better maintenance of cooperation between kinder strategies than stricter ones, despite overall cooperation levels being relatively low. Additionally, moody reinforcement learning combined with certain play termination decision strategies can mimic trends in human cooperation affected by structural changes to the IPD played on dynamic networks—as can kind and simplistic strategies such as Tit-For-Tat. Implications of this in comparison with human data is discussed, and suggestions for diversification of further testing are made.

中文翻译：

“我不想再和你玩了”：喜怒无常的强化学习者在玩囚徒困境时的动态伙伴判断

与未改变的起源相比，利用人类特征作为其概念架构一部分的新兴强化学习算法已被证明可以鼓励社会困境中的合作。特别是，对于静态和动态网络环境，添加情绪机制可以促进多智能体迭代囚徒困境（IPD）游戏中更多的合作行为。当困境的环境方面发生改变时，例如所使用的支付矩阵的结构，情绪改变的代理人也会表现出类似人类的行为趋势。来自人类和基于主体的研究的其他环境影响可能会以以前未研究过的方式与情绪结构相互作用。由于目前有关这些相互作用的文献很少，我们试图通过引入另外两个环境维度来扩展先前的研究；动态网络中的自愿交互，以及通过各种网络重组实现交互的稳定性。从最初的鄂尔多斯-仁义随机网络开始，我们根据现有的人类研究方法操纵网络 IPD 的结构，以调查其研究结果的可能复制。我们还通过引入两种合作伙伴评估机制来促进对手的战略选择，并测试了每种合作伙伴的两个选择门槛。我们发现，即使是动态网络中最低限度的战略游戏终止也足以增强静态水平之上的合作，尽管这些战略决策的阈值对于期望的结果至关重要。尽管整体合作水平相对较低，但更宽容的门槛比更严格的策略更能维持更友善的策略之间的合作。此外，喜怒无常的强化学习与某些游戏终止决策策略相结合，可以模仿受动态网络上的 IPD 结构变化影响的人类合作趋势，就像“一报还一报”等友善和简单化的策略一样。讨论了与人类数据相比的影响，并提出了进一步测试多样化的建议。

更新日期：2024-03-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>