A projected primal-dual gradient optimal control method for deep reinforcement learning

被引：0

作者：

Simon Gottschalk

Michael Burger

Matthias Gerdts

机构：

[1] Fraunhofer ITWM,

[2] Universität der Bundeswehr,undefined

来源：

Journal of Mathematics in Industry | / 10卷

关键词：

Reinforcement learning; Optimal control; Necessary optimality conditions; 49K15; 90C40; 93E35;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this contribution, we start with a policy-based Reinforcement Learning ansatz using neural networks. The underlying Markov Decision Process consists of a transition probability representing the dynamical system and a policy realized by a neural network mapping the current state to parameters of a distribution. Therefrom, the next control can be sampled. In this setting, the neural network is replaced by an ODE, which is based on a recently discussed interpretation of neural networks. The resulting infinite optimization problem is transformed into an optimization problem similar to the well-known optimal control problems. Afterwards, the necessary optimality conditions are established and from this a new numerical algorithm is derived. The operating principle is shown with two examples. It is applied to a simple example, where a moving point is steered through an obstacle course to a desired end position in a 2D plane. The second example shows the applicability to more complex problems. There, the aim is to control the finger tip of a human arm model with five degrees of freedom and 29 Hill’s muscle models to a desired end position.

引用

共 50 条

[1] A projected primal-dual gradient optimal control method for deep reinforcement learning
Gottschalk, Simon
Burger, Michael
Gerdts, Matthias
[J]. JOURNAL OF MATHEMATICS IN INDUSTRY, 2020, 10 (01)
[2] Natural Gradient Primal-Dual Method for Decentralized Learning
Niwa, Kenta
Ishii, Hiro
Sawada, Hiroshi
Fujino, Akinori
Harada, Noboru
Yokota, Rio
[J]. IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2024, 10 (417-433): : 417 - 433
[3] Primal-Dual Reinforcement Learning for Zero-Sum Games in the Optimal Tracking Control
Que, Xuejie
Wang, Zhenlei
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (06) : 3146 - 3150
[4] Projected Stochastic Primal-Dual Method for Constrained Online Learning With Kernels
Koppel, Alec
Zhang, Kaiqing
Zhu, Hao
Basar, Tamer
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (10) : 2528 - 2542
[5] Projected Stochastic Primal-Dual Method for Constrained Online Learning with Kernels
Zhang, Kaiqing
Zhu, Hao
Basar, Tamer
Koppel, Alec
[J]. 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4224 - 4231
[6] A nonmonotone adaptive projected gradient method for primal-dual total variation image restoration
Yu, Gaohang
Xue, Wei
Zhou, Yi
[J]. SIGNAL PROCESSING, 2014, 103 : 242 - 249
[7] A Primal-Dual Formulation for Deep Learning with Constraints
Nandwani, Yatin
Pathak, Abhishek
Mausam
Singla, Parag
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[8] A Projected Primal-Dual Method for Solving Constrained Monotone Inclusions
Briceno-Arias, Luis
Lopez Rivera, Sergio
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2019, 180 (03) : 907 - 924
[9] The Primal-Dual Hybrid Gradient Method for Semiconvex Splittings
Moellenhoff, Thomas
Strekalovskiy, Evgeny
Moeller, Michael
Cremers, Daniel
[J]. SIAM JOURNAL ON IMAGING SCIENCES, 2015, 8 (02): : 827 - 857
[10] PRIMAL-DUAL PROJECTED GRADIENT ALGORITHMS FOR EXTENDED LINEAR-QUADRATIC PROGRAMMING
Zhu, Ciyou
Rockafellar, R. T.
[J]. SIAM JOURNAL ON OPTIMIZATION, 1993, 3 (04) : 751 - 783

← 1 2 3 4 5 →