Explorer-Actor-Critic: Better actors for deep reinforcement learning

被引：1

作者：

Zhang, Junwei ^{[1
,2
]}

Han, Shuai ^{[1
,2
,3
]}

Xiong, Xi ^{[1
,2
]}

Zhu, Sheng ^{[1
,4
]}

Lu, Shuai ^{[1
,2
,4
]}

机构：

[1] Jilin Univ, Key Lab Symbol Comp & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China

[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China

[3] Univ Utrecht, Dept Informat & Comp Sci, Utrecht, Netherlands

[4] Jilin Univ, Coll Software, Changchun 130012, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 662卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Deep reinforcement learning; Sample efficiency; Overestimation bias; Actor-critic framework; Exploration and exploitation;

D O I：

10.1016/j.ins.2024.120255

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Actor -critic deep reinforcement learning methods have demonstrated significant performance in many challenging decision -making and control tasks, but also suffer from high sample complexity and overestimation bias. Current researches focus on using underestimation to balance overestimation and reducing bias through ensemble learning, but introducing underestimation bias and excessive network costs. In this paper, we first analyze the effect of action selection policy on estimation bias. Then, we propose the Explorer -Actor -Critic (EAC) method that gives a more conservative objective for the actor to reduce overestimation, introduces a learnable explorer to improve exploration ability, and uses an action mixing mechanism to mitigate experience distribution bias. Furthermore, we apply the EAC method to TD3 and SAC and verify its effectiveness through extensive comparison and ablation experiments. Our algorithm not only outperforms state-of-the-art algorithms, but also is compatible with other actor -critic methods.

引用

页数：17

共 50 条

[21] Symmetric actor-critic deep reinforcement learning for cascade quadrotor flight control
Han, Haoran
Cheng, Jian
Xi, Zhilong
Lv, Maolong
[J]. NEUROCOMPUTING, 2023, 559
[22] Dynamic spectrum access and sharing through actor-critic deep reinforcement learning
Liang Dong
Yuchen Qian
Yuan Xing
[J]. EURASIP Journal on Wireless Communications and Networking, 2022
[23] Automatic collective motion tuning using actor-critic deep reinforcement learning
Abpeikar, Shadi
Kasmarik, Kathryn
Garratt, Matthew
Hunjet, Robert
Khan, Md Mohiuddin
Qiu, Huanneng
[J]. SWARM AND EVOLUTIONARY COMPUTATION, 2022, 72
[24] Actor-Critic Deep Reinforcement Learning for Solving Job Shop Scheduling Problems
Liu, Chien-Liang
Chang, Chuan-Chin
Tseng, Chun-Jan
[J]. IEEE ACCESS, 2020, 8 : 71752 - 71762
[25] Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
Lee, Alex X.
Nagabandi, Anusha
Abbeel, Pieter
Levine, Sergey
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[26] Symmetric actor-critic deep reinforcement learning for cascade quadrotor flight control
Han, Haoran
Cheng, Jian
Xi, Zhilong
Lv, Maolong
[J]. NEUROCOMPUTING, 2023, 559
[27] Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods
Saglam, Baturay
Duran, Enes
Cicek, Dogan C.
Mutlu, Furkan B.
Kozat, Suleyman S.
[J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 137 - 144
[28] DAG-based workflows scheduling using Actor–Critic Deep Reinforcement Learning
Koslovski, Guilherme Piêgas
Pereira, Kleiton
Albuquerque, Paulo Roberto
[J]. Future Generation Computer Systems, 2024, 150 : 354 - 363
[29] Dynamic spectrum access and sharing through actor-critic deep reinforcement learning
Dong, Liang
Qian, Yuchen
Xing, Yuan
[J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2022, 2022 (01)
[30] A World Model for Actor-Critic in Reinforcement Learning
Panov, A. I.
Ugadiarov, L. A.
[J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477

← 1 2 3 4 5 →