Bias Correction in Deterministic Policy Gradient Using Robust MPC

被引：0

作者：

Kordabad, Arash Bahari ^{[1
]}

Esfahani, Hossein Nejatbakhsh ^{[1
]}

Gros, Sebastien ^{[1
]}

机构：

[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway

来源：

2021 EUROPEAN CONTROL CONFERENCE (ECC) | 2021年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we discuss the deterministic policy gradient using the Actor-Critic methods based on the linear compatible advantage function approximator, where the input spaces are continuous. When the policy is restricted by hard constraints, the exploration may not be Centred or Isotropic (non-CI). As a result, the policy gradient estimation can be biased. We focus on constrained policies based on Model Predictive Control (MPC) schemes and to address the bias issue, we propose an approximate Robust MPC approach accounting for the exploration. The RMPC-based policy ensures that a Centered and Isotropic (CI) exploration is approximately feasible. A posterior projection is used to ensure its exact feasibility, we formally prove that this approach does not bias the gradient estimation.

引用

页码：1086 / 1091

页数：6

共 50 条

[41] Friend-or-Foe Deep Deterministic Policy Gradient
Jiang, Hao
Shi, Dianxi
Xue, Chao
Wang, Yajie
Wang, Gongju
Zhang, Yongjun
2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3523 - 3530
[42] Deep Deterministic Policy Gradient for Nested Parallel Negotiation
Arakawa, Ryota
Fujita, Katsuhide
2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2023, : 197 - 204
[43] Deep Deterministic Policy Gradient With Classified Experience Replay
Shi S.-M.
Liu Q.
Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (07): : 1816 - 1823
[44] Hierarchical Intermittent Motor Control With Deterministic Policy Gradient
Shi, Haibo
Sun, Yaoru
Li, Guangyuan
Wang, Fang
Wang, Daming
Li, Jie
IEEE ACCESS, 2019, 7 : 41799 - 41810
[45] Deep Deterministic Policy Gradient with Clustered Prioritized Sampling
Wu, Wen
Zhu, Fei
Fu, YuChen
Liu, Quan
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 645 - 654
[46] Deep Deterministic Policy Gradient With Compatible Critic Network
Wang, Di
Hu, Mengqi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4332 - 4344
[47] Deep deterministic policy gradient algorithm: A systematic review
Sumiea, Ebrahim Hamid
Abdulkadir, Said Jadid
Alhussian, Hitham Seddig
Al-Selwi, Safwan Mahmood
Alqushaibi, Alawi
Ragab, Mohammed Gamal
Fati, Suliman Mohamed
HELIYON, 2024, 10 (09)
[48] Quasi-Newton Iteration in Deterministic Policy Gradient
Kordabad, Arash Bahari
Esfahani, Hossein Nejatbakhsh
Cai, Wenqi
Gros, Sebastien
2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 2124 - 2129
[49] Alternated Greedy-Step Deterministic Policy Gradient
Wang, Xuesong
Zhang, Jiazhi
Gu, Yang
Huang, Longyang
Yu, Kun
Cheng, Yuhu
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (04) : 2190 - 2201
[50] Deep deterministic policy gradient algorithm for UAV control
Huang X.
Liu J.
Jia C.
Wang Z.
Zhang J.
Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2021, 42 (11):

← 1 2 3 4 5 →