Bias Correction in Deterministic Policy Gradient Using Robust MPC

被引:0
|
作者
Kordabad, Arash Bahari [1 ]
Esfahani, Hossein Nejatbakhsh [1 ]
Gros, Sebastien [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we discuss the deterministic policy gradient using the Actor-Critic methods based on the linear compatible advantage function approximator, where the input spaces are continuous. When the policy is restricted by hard constraints, the exploration may not be Centred or Isotropic (non-CI). As a result, the policy gradient estimation can be biased. We focus on constrained policies based on Model Predictive Control (MPC) schemes and to address the bias issue, we propose an approximate Robust MPC approach accounting for the exploration. The RMPC-based policy ensures that a Centered and Isotropic (CI) exploration is approximately feasible. A posterior projection is used to ensure its exact feasibility, we formally prove that this approach does not bias the gradient estimation.
引用
收藏
页码:1086 / 1091
页数:6
相关论文
共 50 条
  • [1] Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies
    Gros, Sebastien
    Zanon, Mario
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2543 - 2548
  • [2] A Painless Deterministic Policy Gradient Method for Learning-based MPC
    Anand, Akhil S.
    Reinhardt, Dirk
    Sawant, Shambhuraj
    Gravdahl, Jan Tommy
    Gros, Sebastien
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [3] BYZANTINE-ROBUST FEDERATED DEEP DETERMINISTIC POLICY GRADIENT
    Lin, Qifeng
    Ling, Qing
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4013 - 4017
  • [4] Deterministic Policy Gradient With Integral Compensator for Robust Quadrotor Control
    Wang, Yuanda
    Sun, Jia
    He, Haibo
    Sun, Changyin
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (10): : 3713 - 3725
  • [5] Combining Q-learning and Deterministic Policy Gradient for Learning-based MPC
    Seel, Katrine
    Gros, Ebastien
    Gravdahl, Jan Tommy
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 610 - 617
  • [6] Developing Flight Control Policy Using Deep Deterministic Policy Gradient
    Tsourdos, Antonios
    Permana, Adhi Dharma
    Budiarti, Dewi H.
    Shin, Hyo-Sang
    Lee, Chang-Hun
    2019 IEEE INTERNATIONAL CONFERENCE ON AEROSPACE ELECTRONICS AND REMOTE SENSING TECHNOLOGY (ICARES 2019), 2019,
  • [7] Deterministic Policy Gradient Algorithms
    Silver, David
    Lever, Guy
    Heess, Nicolas
    Degris, Thomas
    Wierstra, Daan
    Riedmiller, Martin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [8] Proximal Deterministic Policy Gradient
    Maggipinto, Marco
    Susto, Gian Antonio
    Chaudhari, Pratik
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5438 - 5444
  • [9] Alleviating the estimation bias of deep deterministic policy gradient via co-regularization
    Li, Yao
    Wang, YuHui
    Gan, YaoZhong
    Tan, XiaoYang
    PATTERN RECOGNITION, 2022, 131
  • [10] Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient
    Wu, Dongming
    Dong, Xingping
    Shen, Jianbing
    Hoi, Steven C. H.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4933 - 4945