当前位置: X-MOL 学术ACM Trans. Model. Comput. Simul. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DSMC Evaluation Stages: Fostering Robust and Safe Behavior in Deep Reinforcement Learning – Extended Version
ACM Transactions on Modeling and Computer Simulation ( IF 0.9 ) Pub Date : 2023-10-26 , DOI: 10.1145/3607198
Timo P. Gros 1 , Joschka Groß 1 , Daniel Höller 1 , Jörg Hoffmann 2 , Michaela Klauck 1 , Hendrik Meerkamp 1 , Nicola J. Müller 1 , Lukas Schaller 1 , Verena Wolf 3
Affiliation  

Neural networks (NN) are gaining importance in sequential decision-making. Deep reinforcement learning (DRL), in particular, is extremely successful in learning action policies in complex and dynamic environments. Despite this success, however, DRL technology is not without its failures, especially in safety-critical applications: (i) the training objective maximizes average rewards, which may disregard rare but critical situations and hence lack local robustness; (ii) optimization objectives targeting safety typically yield degenerated reward structures, which, for DRL to work, must be replaced with proxy objectives. Here, we introduce a methodology that can help to address both deficiencies. We incorporate evaluation stages (ES) into DRL, leveraging recent work on deep statistical model checking (DSMC), which verifies NN policies in Markov decision processes. Our ES apply DSMC at regular intervals to determine state space regions with weak performance. We adapt the subsequent DRL training priorities based on the outcome, (i) focusing DRL on critical situations and (ii) allowing to foster arbitrary objectives.

We run case studies on two benchmarks. One of them is the Racetrack, an abstraction of autonomous driving that requires navigating a map without crashing into a wall. The other is MiniGrid, a widely used benchmark in the AI community. Our results show that DSMC-based ES can significantly improve both (i) and (ii).



中文翻译:

DSMC 评估阶段:在深度强化学习中培养稳健且安全的行为 - 扩展版本

神经网络 (NN) 在顺序决策中变得越来越重要。特别是深度强化学习(DRL)在复杂和动态环境中学习行动策略方面非常成功。然而,尽管取得了如此成功,DRL 技术也并非没有失败,特别是在安全关键型应用中:(i) 训练目标最大化平均奖励,这可能会忽视罕见但关键的情况,因此缺乏局部鲁棒性;(ii) 针对安全性的优化目标通常会产生退化的奖励结构,为了使 DRL 发挥作用,必须用代理目标取代这种结构。在这里,我们介绍一种可以帮助解决这两个缺陷的方法。我们将评估阶段(ES) 纳入 DRL,利用深度统计模型检查 (DSMC) 的最新工作,验证马尔可夫决策过程中的神经网络策略。我们的 ES 定期应用 DSMC 来确定性能较弱的状态空间区域。我们根据结果调整后续的 DRL 培训优先级,(i) 将 DRL 重点放在关键情况上,以及 (ii) 允许培养任意目标。

我们对两个基准进行案例研究。其中之一是赛道,这是自动驾驶的一个抽象概念,要求在不撞墙的情况下导航地图。另一个是 MiniGrid,这是人工智能社区广泛使用的基准。我们的结果表明,基于 DSMC 的 ES 可以显着改善 (i) 和 (ii)。

更新日期:2023-10-26
down
wechat
bug