当前位置: X-MOL 学术Robomech J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimization algorithm for feedback and feedforward policies towards robot control robust to sensing failures
ROBOMECH Journal Pub Date : 2022-07-14 , DOI: 10.1186/s40648-022-00232-w
Taisuke Kobayashi , Kenta Yoshizawa

Model-free or learning-based control, in particular, reinforcement learning (RL), is expected to be applied for complex robotic tasks. Traditional RL requires that a policy to be optimized is state-dependent, that means, the policy is a kind of feedback (FB) controllers. Due to the necessity of correct state observation in such a FB controller, it is sensitive to sensing failures. To alleviate this drawback of the FB controllers, feedback error learning integrates one of them with a feedforward (FF) controller. RL can be improved by dealing with the FB/FF policies, but to the best of our knowledge, a methodology for learning them in a unified manner has not been developed. In this paper, we propose a new optimization problem for optimizing both the FB/FF policies simultaneously. Inspired by control as inference, the proposed optimization problem considers minimization/maximization of divergences between trajectories, one is predicted by the composed policy and a stochastic dynamics model, and others are inferred as optimal/non-optimal ones. By approximating the stochastic dynamics model using variational method, we naturally derive a regularization between the FB/FF policies. In numerical simulations and a robot experiment, we verified that the proposed method can stably optimize the composed policy even with the different learning law from the traditional RL. In addition, we demonstrated that the FF policy is robust to the sensing failures and can hold the optimal motion.

中文翻译:

机器人控制反馈和前馈策略的优化算法对传感故障具有鲁棒性

无模型或基于学习的控制,特别是强化学习 (RL),有望应用于复杂的机器人任务。传统的 RL 要求要优化的策略是状态相关的,也就是说,该策略是一种反馈(FB)控制器。由于在这样的 FB 控制器中需要正确的状态观察,它对传感故障很敏感。为了减轻 FB 控制器的这个缺点,反馈误差学习将其中一个与前馈 (FF) 控制器集成在一起。可以通过处理 FB/FF 策略来改进 RL,但据我们所知,尚未开发出一种以统一方式学习它们的方法。在本文中,我们提出了一个新的优化问题,用于同时优化 FB/FF 策略。受控制作为推理的启发,所提出的优化问题考虑了轨迹之间差异的最小化/最大化,一个由组合策略和随机动力学模型预测,其他的被推断为最优/非最优。通过使用变分方法逼近随机动力学模型,我们自然地推导出 FB/FF 策略之间的正则化。在数值模拟和机器人实验中,我们验证了即使使用与传统 RL 不同的学习规律,所提出的方法也可以稳定地优化组合策略。此外,我们证明了 FF 策略对传感失败具有鲁棒性,并且可以保持最佳运动。其他的被推断为最优/非最优。通过使用变分方法逼近随机动力学模型,我们自然地推导出 FB/FF 策略之间的正则化。在数值模拟和机器人实验中,我们验证了即使使用与传统 RL 不同的学习规律,所提出的方法也可以稳定地优化组合策略。此外,我们证明了 FF 策略对传感失败具有鲁棒性,并且可以保持最佳运动。其他的被推断为最优/非最优。通过使用变分方法逼近随机动力学模型,我们自然地推导出 FB/FF 策略之间的正则化。在数值模拟和机器人实验中,我们验证了即使使用与传统 RL 不同的学习规律,所提出的方法也可以稳定地优化组合策略。此外,我们证明了 FF 策略对传感失败具有鲁棒性,并且可以保持最佳运动。
更新日期:2022-07-15
down
wechat
bug