当前位置: X-MOL 学术J. Autom. Reason. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal Deterministic Controller Synthesis from Steady-State Distributions
Journal of Automated Reasoning ( IF 1.1 ) Pub Date : 2023-01-12 , DOI: 10.1007/s10817-022-09657-9
Alvaro Velasquez , Ismail Alkhouri , K. Subramani , Piotr Wojciechowski , George Atia

The formal synthesis of control policies is a classic problem that entails the computation of optimal strategies for an agent interacting in some environment such that some formal guarantees of behavior are met. These guarantees are specified as part of the problem description and can be supplied by the end user in the form of various logics (e.g., Linear Temporal Logic and Computation Tree Logic) or imposed via constraints on agent-environment interactions. The latter has received significant attention in recent years within the context of constraints on the asymptotic frequency with which an agent visits states of interest. This is captured by the steady-state distribution of the agent. The formal synthesis of stochastic policies satisfying constraints on this distribution has been studied. However, the derivation of deterministic policies for the same has received little attention. In this paper, we focus on this deterministic steady-state control problem, i.e., the problem of obtaining a deterministic policy for optimal expected-reward behavior in the presence of linear constraints representing the desired steady-state behavior. Two integer linear programs are proposed and validated experimentally to solve this problem in unichain and multichain Markov decision processes. Finally, we prove that this problem is NP-hard even in the restricted setting where deterministic transitions are enforced on the MDP and there are only two actions.



中文翻译:

稳态分布的最优确定性控制器综合

控制策略的形式综合是一个经典问题,它需要为在某些环境中交互的代理计算最优策略,以便满足某些行为的形式保证。这些保证被​​指定为问题描述的一部分,并且可以由最终用户以各种逻辑(例如,线性时间逻辑和计算树逻辑)的形式提供,或者通过对代理-环境交互的约束来施加。近年来,在对代理访问感兴趣状态的渐近频率的约束的背景下,后者受到了极大的关注。这是由代理的稳态分布捕获的。已经研究了满足对这种分布的约束的随机策略的形式综合。然而,确定性政策的推导很少受到关注。在本文中,我们专注于这种确定性稳态控制问题,即在存在表示所需稳态行为的线性约束的情况下获得最优预期奖励行为的确定性策略的问题。提出了两个整数线性程序并通过实验验证以解决单链和多链马尔可夫决策过程中的这个问题。最后,我们证明这个问题是 提出了两个整数线性程序并通过实验验证以解决单链和多链马尔可夫决策过程中的这个问题。最后,我们证明这个问题是 提出了两个整数线性程序并通过实验验证以解决单链和多链马尔可夫决策过程中的这个问题。最后,我们证明这个问题是NP-hard即使在 MDP 上强制执行确定性转换并且只有两个动作的受限设置中也是如此。

更新日期:2023-01-13
down
wechat
bug