A projected primal-dual gradient optimal control method for deep reinforcement learning

被引:0
|
作者
Simon Gottschalk
Michael Burger
Matthias Gerdts
机构
[1] Fraunhofer ITWM,
[2] Universität der Bundeswehr,undefined
关键词
Reinforcement learning; Optimal control; Necessary optimality conditions; 49K15; 90C40; 93E35;
D O I
暂无
中图分类号
学科分类号
摘要
In this contribution, we start with a policy-based Reinforcement Learning ansatz using neural networks. The underlying Markov Decision Process consists of a transition probability representing the dynamical system and a policy realized by a neural network mapping the current state to parameters of a distribution. Therefrom, the next control can be sampled. In this setting, the neural network is replaced by an ODE, which is based on a recently discussed interpretation of neural networks. The resulting infinite optimization problem is transformed into an optimization problem similar to the well-known optimal control problems. Afterwards, the necessary optimality conditions are established and from this a new numerical algorithm is derived. The operating principle is shown with two examples. It is applied to a simple example, where a moving point is steered through an obstacle course to a desired end position in a 2D plane. The second example shows the applicability to more complex problems. There, the aim is to control the finger tip of a human arm model with five degrees of freedom and 29 Hill’s muscle models to a desired end position.
引用
收藏
相关论文
共 50 条
  • [1] A projected primal-dual gradient optimal control method for deep reinforcement learning
    Gottschalk, Simon
    Burger, Michael
    Gerdts, Matthias
    [J]. JOURNAL OF MATHEMATICS IN INDUSTRY, 2020, 10 (01)
  • [2] Natural Gradient Primal-Dual Method for Decentralized Learning
    Niwa, Kenta
    Ishii, Hiro
    Sawada, Hiroshi
    Fujino, Akinori
    Harada, Noboru
    Yokota, Rio
    [J]. IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2024, 10 (417-433): : 417 - 433
  • [3] Primal-Dual Reinforcement Learning for Zero-Sum Games in the Optimal Tracking Control
    Que, Xuejie
    Wang, Zhenlei
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (06) : 3146 - 3150
  • [4] Projected Stochastic Primal-Dual Method for Constrained Online Learning With Kernels
    Koppel, Alec
    Zhang, Kaiqing
    Zhu, Hao
    Basar, Tamer
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (10) : 2528 - 2542
  • [5] Projected Stochastic Primal-Dual Method for Constrained Online Learning with Kernels
    Zhang, Kaiqing
    Zhu, Hao
    Basar, Tamer
    Koppel, Alec
    [J]. 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4224 - 4231
  • [6] A nonmonotone adaptive projected gradient method for primal-dual total variation image restoration
    Yu, Gaohang
    Xue, Wei
    Zhou, Yi
    [J]. SIGNAL PROCESSING, 2014, 103 : 242 - 249
  • [7] A Primal-Dual Formulation for Deep Learning with Constraints
    Nandwani, Yatin
    Pathak, Abhishek
    Mausam
    Singla, Parag
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [8] A Projected Primal-Dual Method for Solving Constrained Monotone Inclusions
    Briceno-Arias, Luis
    Lopez Rivera, Sergio
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2019, 180 (03) : 907 - 924
  • [9] The Primal-Dual Hybrid Gradient Method for Semiconvex Splittings
    Moellenhoff, Thomas
    Strekalovskiy, Evgeny
    Moeller, Michael
    Cremers, Daniel
    [J]. SIAM JOURNAL ON IMAGING SCIENCES, 2015, 8 (02): : 827 - 857
  • [10] PRIMAL-DUAL PROJECTED GRADIENT ALGORITHMS FOR EXTENDED LINEAR-QUADRATIC PROGRAMMING
    Zhu, Ciyou
    Rockafellar, R. T.
    [J]. SIAM JOURNAL ON OPTIMIZATION, 1993, 3 (04) : 751 - 783