Bias Correction in Deterministic Policy Gradient Using Robust MPC

被引：0

作者：

Kordabad, Arash Bahari ^{[1
]}

Esfahani, Hossein Nejatbakhsh ^{[1
]}

Gros, Sebastien ^{[1
]}

机构：

[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway

来源：

2021 EUROPEAN CONTROL CONFERENCE (ECC) | 2021年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we discuss the deterministic policy gradient using the Actor-Critic methods based on the linear compatible advantage function approximator, where the input spaces are continuous. When the policy is restricted by hard constraints, the exploration may not be Centred or Isotropic (non-CI). As a result, the policy gradient estimation can be biased. We focus on constrained policies based on Model Predictive Control (MPC) schemes and to address the bias issue, we propose an approximate Robust MPC approach accounting for the exploration. The RMPC-based policy ensures that a Centered and Isotropic (CI) exploration is approximately feasible. A posterior projection is used to ensure its exact feasibility, we formally prove that this approach does not bias the gradient estimation.

引用

页码：1086 / 1091

页数：6

共 50 条

[1] Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies
Gros, Sebastien
Zanon, Mario
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2543 - 2548
[2] A Painless Deterministic Policy Gradient Method for Learning-based MPC
Anand, Akhil S.
Reinhardt, Dirk
Sawant, Shambhuraj
Gravdahl, Jan Tommy
Gros, Sebastien
2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
[3] BYZANTINE-ROBUST FEDERATED DEEP DETERMINISTIC POLICY GRADIENT
Lin, Qifeng
Ling, Qing
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4013 - 4017
[4] Deterministic Policy Gradient With Integral Compensator for Robust Quadrotor Control
Wang, Yuanda
Sun, Jia
He, Haibo
Sun, Changyin
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (10): : 3713 - 3725
[5] Combining Q-learning and Deterministic Policy Gradient for Learning-based MPC
Seel, Katrine
Gros, Ebastien
Gravdahl, Jan Tommy
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 610 - 617
[6] Developing Flight Control Policy Using Deep Deterministic Policy Gradient
Tsourdos, Antonios
Permana, Adhi Dharma
Budiarti, Dewi H.
Shin, Hyo-Sang
Lee, Chang-Hun
2019 IEEE INTERNATIONAL CONFERENCE ON AEROSPACE ELECTRONICS AND REMOTE SENSING TECHNOLOGY (ICARES 2019), 2019,
[7] Deterministic Policy Gradient Algorithms
Silver, David
Lever, Guy
Heess, Nicolas
Degris, Thomas
Wierstra, Daan
Riedmiller, Martin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[8] Proximal Deterministic Policy Gradient
Maggipinto, Marco
Susto, Gian Antonio
Chaudhari, Pratik
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5438 - 5444
[9] Alleviating the estimation bias of deep deterministic policy gradient via co-regularization
Li, Yao
Wang, YuHui
Gan, YaoZhong
Tan, XiaoYang
PATTERN RECOGNITION, 2022, 131
[10] Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient
Wu, Dongming
Dong, Xingping
Shen, Jianbing
Hoi, Steven C. H.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4933 - 4945

← 1 2 3 4 5 →