Reinforcement Learning for Model Problems of Optimal Control

Semenov, S. S.; Tsurkov, V. I.

doi:10.1134/S1064230723030127

Reinforcement Learning for Model Problems of Optimal Control

ARTIFICIAL INTELLIGENCE
Published: 01 October 2023

Volume 62, pages 508–521, (2023)
Cite this article

Journal of Computer and Systems Sciences International Aims and scope

S. S. Semenov¹ &
V. I. Tsurkov²

65 Accesses
Explore all metrics

Abstract

The functionals of dynamic systems of various types are optimized using modern methods of reinforcement learning. The linear resource allocation problem, as well as the optimal consumption problem and its stochastic modifications are considered. In the reinforcement learning strategy gradient methods are used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

A review of cooperative multi-agent deep reinforcement learning

Article 14 October 2022

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

REFERENCES

M. Sewak, “Deterministic policy gradient and the DDPG: Deterministic-policy-gradient-based approaches,” in Deep Reinforcement Learning (Springer, 2019), pp. 173–184.
Book Google Scholar
J. Schulman, “Trust region policy optimization,” 2015. https://arxiv.org/abs/1502.05477.
T. Haarnoja, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” 2018. https://arxiv.org/abs/1801.01290.
S. Huang, “A2C is a special case of PPO,” 2022. https://arxiv.org/abs/2205.09123.
J. Schulman, “Proximal policy optimization algorithms,” 2017. https://arxiv.org/abs/1707.06347.
L. Zhang, “Penalized proximal policy optimization for safe reinforcement learning,” 2022. https://arxiv.org/abs/2205.11814.
X. Chen, “The sufficiency of off-policyness: PPO is insufficient according to an off-policy measure,” 2022. https://arxiv.org/abs/2205.10047.
A. Ghosh, “Provably efficient model-free constrained RL with linear function approximation,” 2022. https://arxiv.org/abs/2206.11889.
Z. Song, “Safe-FinRL: A low bias and variance deep reinforcement learning implementation for high-freq stock trading,” 2022. https://arxiv.org/abs/2206.05910.
M. Kaledin, “Variance reduction for policy-gradient methods via empirical variance minimization,” 2022. https://arxiv.org/abs/2206.06827.
Q. Luo, “Finite-time analysis of fully decentralized single-timescale actor-critic,” 2022. https://arxiv.org/abs/2206.05733.
A. Deka, “ARC—actor residual critic for adversarial imitation learning,” 2022. https://arxiv.org/abs/2206.02095.
V. I. Tsurkov, Dynamical High-Dimensional Problems (Nauka, Moscow, 1988) [in Russian].
Google Scholar
L. A. Beklaryan, A. Yu. Flerova, and A. A. Zhukova, Optimal Control Problems: A Teaching Book (Nauka, Moscow, 2018) [in Russian].
Google Scholar
B. Øksendal, Stochastic Differential Equations: An Introduction with Applications (Springer, Berlin, 2003; Mir, Moscow, 2003).
L. S. Pontryagin, The Maximum Principle in Optimal Control (Nauka, Moscow, 2004) [in Russian].
Google Scholar

Download references

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Moscow Oblast, Russia
S. S. Semenov
Federal Research Center “Computer Science and Control,” Russian Academy of Sciences, 119333, Moscow, Russia
V. I. Tsurkov

Authors

S. S. Semenov
View author publications
You can also search for this author in PubMed Google Scholar
V. I. Tsurkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to S. S. Semenov or V. I. Tsurkov.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Semenov, S.S., Tsurkov, V.I. Reinforcement Learning for Model Problems of Optimal Control. J. Comput. Syst. Sci. Int. 62, 508–521 (2023). https://doi.org/10.1134/S1064230723030127

Download citation

Received: 10 November 2022
Revised: 08 January 2023
Accepted: 06 February 2023
Published: 01 October 2023
Issue Date: June 2023
DOI: https://doi.org/10.1134/S1064230723030127

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions