Optimized tracking control using reinforcement learning and backstepping technique for canonical nonlinear unknown dynamic system,Optimal Control Applications and Methods

当前位置： X-MOL 学术 › Optim. Control Appl. Methods › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimized tracking control using reinforcement learning and backstepping technique for canonical nonlinear unknown dynamic system
Optimal Control Applications and Methods ( IF 1.8 ) Pub Date : 2024-02-26 , DOI: 10.1002/oca.3115
Yanfen Song _{1,

2} , Zijun Li _{1,

2} , Guoxing Wen ₂

Affiliation

The work addresses the optimized tracking control problem by combining both reinforcement learning (RL) and backstepping technique for the canonical nonlinear unknown dynamic system. Since such dynamic system contains multiple state variables with differential relation, the backstepping technique is considered by making a virtual control sequence in accordance with Lyapunov functions. In the last backstepping step, the optimized actual control is derived by performing the RL under identifier‐critic‐actor structure, where RL is to overcome the difficulty coming from solving Hamilton‐Jacobi‐Bellman (HJB) equation. Different from the traditional RL optimizing methods that find the RL updating laws from the square of the HJB equation's approximation, this optimized control is to find the RL training laws from the negative gradient of a simple positive definite function, which is equivalent to the HJB equation. The result shows that this optimized control can obviously alleviate the algorithm complexity. Meanwhile, it can remove the requirement of known dynamic as well. Finally, theory and simulation indicate the feasibility of this optimized control.

中文翻译：

使用强化学习和反步技术对典型非线性未知动态系统进行优化跟踪控制

该工作通过结合强化学习（RL）和反步技术来解决典型非线性未知动态系统的优化跟踪控制问题。由于这种动态系统包含多个具有微分关系的状态变量，因此根据Lyapunov函数构建虚拟控制序列来考虑反步技术。在最后的反步步骤中，通过在识别器-批评者-参与者结构下执行强化学习来导出优化的实际控制，其中强化学习是为了克服求解汉密尔顿-雅可比-贝尔曼（HJB）方程的困难。与传统的强化学习优化方法从HJB方程近似值的平方寻找强化学习更新规律不同，这种优化控制是从简单正定函数的负梯度寻找强化学习训练规律，相当于HJB方程。结果表明，这种优化控制可以明显降低算法复杂度。同时，它也可以消除已知动态的要求。最后，理论和仿真表明了这种优化控制的可行性。

更新日期：2024-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>