Bias Correction in Deterministic Policy Gradient Using Robust MPC

被引:0
|
作者
Kordabad, Arash Bahari [1 ]
Esfahani, Hossein Nejatbakhsh [1 ]
Gros, Sebastien [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway
来源
2021 EUROPEAN CONTROL CONFERENCE (ECC) | 2021年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we discuss the deterministic policy gradient using the Actor-Critic methods based on the linear compatible advantage function approximator, where the input spaces are continuous. When the policy is restricted by hard constraints, the exploration may not be Centred or Isotropic (non-CI). As a result, the policy gradient estimation can be biased. We focus on constrained policies based on Model Predictive Control (MPC) schemes and to address the bias issue, we propose an approximate Robust MPC approach accounting for the exploration. The RMPC-based policy ensures that a Centered and Isotropic (CI) exploration is approximately feasible. A posterior projection is used to ensure its exact feasibility, we formally prove that this approach does not bias the gradient estimation.
引用
收藏
页码:1086 / 1091
页数:6
相关论文
共 50 条
  • [41] Friend-or-Foe Deep Deterministic Policy Gradient
    Jiang, Hao
    Shi, Dianxi
    Xue, Chao
    Wang, Yajie
    Wang, Gongju
    Zhang, Yongjun
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3523 - 3530
  • [42] Deep Deterministic Policy Gradient for Nested Parallel Negotiation
    Arakawa, Ryota
    Fujita, Katsuhide
    2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2023, : 197 - 204
  • [43] Deep Deterministic Policy Gradient With Classified Experience Replay
    Shi S.-M.
    Liu Q.
    Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (07): : 1816 - 1823
  • [44] Hierarchical Intermittent Motor Control With Deterministic Policy Gradient
    Shi, Haibo
    Sun, Yaoru
    Li, Guangyuan
    Wang, Fang
    Wang, Daming
    Li, Jie
    IEEE ACCESS, 2019, 7 : 41799 - 41810
  • [45] Deep Deterministic Policy Gradient with Clustered Prioritized Sampling
    Wu, Wen
    Zhu, Fei
    Fu, YuChen
    Liu, Quan
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 645 - 654
  • [46] Deep Deterministic Policy Gradient With Compatible Critic Network
    Wang, Di
    Hu, Mengqi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4332 - 4344
  • [47] Deep deterministic policy gradient algorithm: A systematic review
    Sumiea, Ebrahim Hamid
    Abdulkadir, Said Jadid
    Alhussian, Hitham Seddig
    Al-Selwi, Safwan Mahmood
    Alqushaibi, Alawi
    Ragab, Mohammed Gamal
    Fati, Suliman Mohamed
    HELIYON, 2024, 10 (09)
  • [48] Quasi-Newton Iteration in Deterministic Policy Gradient
    Kordabad, Arash Bahari
    Esfahani, Hossein Nejatbakhsh
    Cai, Wenqi
    Gros, Sebastien
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 2124 - 2129
  • [49] Alternated Greedy-Step Deterministic Policy Gradient
    Wang, Xuesong
    Zhang, Jiazhi
    Gu, Yang
    Huang, Longyang
    Yu, Kun
    Cheng, Yuhu
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (04) : 2190 - 2201
  • [50] Deep deterministic policy gradient algorithm for UAV control
    Huang X.
    Liu J.
    Jia C.
    Wang Z.
    Zhang J.
    Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2021, 42 (11):