Skip to main content
Log in

Reinforcement Learning for Model Problems of Optimal Control

  • ARTIFICIAL INTELLIGENCE
  • Published:
Journal of Computer and Systems Sciences International Aims and scope

Abstract

The functionals of dynamic systems of various types are optimized using modern methods of reinforcement learning. The linear resource allocation problem, as well as the optimal consumption problem and its stochastic modifications are considered. In the reinforcement learning strategy gradient methods are used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.

Similar content being viewed by others

REFERENCES

  1. M. Sewak, “Deterministic policy gradient and the DDPG: Deterministic-policy-gradient-based approaches,” in Deep Reinforcement Learning (Springer, 2019), pp. 173–184.

    Book  Google Scholar 

  2. J. Schulman, “Trust region policy optimization,” 2015. https://arxiv.org/abs/1502.05477.

  3. T. Haarnoja, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” 2018. https://arxiv.org/abs/1801.01290.

  4. S. Huang, “A2C is a special case of PPO,” 2022. https://arxiv.org/abs/2205.09123.

  5. J. Schulman, “Proximal policy optimization algorithms,” 2017. https://arxiv.org/abs/1707.06347.

  6. L. Zhang, “Penalized proximal policy optimization for safe reinforcement learning,” 2022. https://arxiv.org/abs/2205.11814.

  7. X. Chen, “The sufficiency of off-policyness: PPO is insufficient according to an off-policy measure,” 2022. https://arxiv.org/abs/2205.10047.

  8. A. Ghosh, “Provably efficient model-free constrained RL with linear function approximation,” 2022. https://arxiv.org/abs/2206.11889.

  9. Z. Song, “Safe-FinRL: A low bias and variance deep reinforcement learning implementation for high-freq stock trading,” 2022. https://arxiv.org/abs/2206.05910.

  10. M. Kaledin, “Variance reduction for policy-gradient methods via empirical variance minimization,” 2022. https://arxiv.org/abs/2206.06827.

  11. Q. Luo, “Finite-time analysis of fully decentralized single-timescale actor-critic,” 2022. https://arxiv.org/abs/2206.05733.

  12. A. Deka, “ARC—actor residual critic for adversarial imitation learning,” 2022. https://arxiv.org/abs/2206.02095.

  13. V. I. Tsurkov, Dynamical High-Dimensional Problems (Nauka, Moscow, 1988) [in Russian].

    Google Scholar 

  14. L. A. Beklaryan, A. Yu. Flerova, and A. A. Zhukova, Optimal Control Problems: A Teaching Book (Nauka, Moscow, 2018) [in Russian].

    Google Scholar 

  15. B. Øksendal, Stochastic Differential Equations: An Introduction with Applications (Springer, Berlin, 2003; Mir, Moscow, 2003).

  16. L. S. Pontryagin, The Maximum Principle in Optimal Control (Nauka, Moscow, 2004) [in Russian].

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to S. S. Semenov or V. I. Tsurkov.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Semenov, S.S., Tsurkov, V.I. Reinforcement Learning for Model Problems of Optimal Control. J. Comput. Syst. Sci. Int. 62, 508–521 (2023). https://doi.org/10.1134/S1064230723030127

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1064230723030127

Navigation