Explorer-Actor-Critic: Better actors for deep reinforcement learning

被引：1

作者：

Zhang, Junwei ^{[1
,2
]}

Han, Shuai ^{[1
,2
,3
]}

Xiong, Xi ^{[1
,2
]}

Zhu, Sheng ^{[1
,4
]}

Lu, Shuai ^{[1
,2
,4
]}

机构：

[1] Jilin Univ, Key Lab Symbol Comp & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China

[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China

[3] Univ Utrecht, Dept Informat & Comp Sci, Utrecht, Netherlands

[4] Jilin Univ, Coll Software, Changchun 130012, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 662卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Deep reinforcement learning; Sample efficiency; Overestimation bias; Actor-critic framework; Exploration and exploitation;

D O I：

10.1016/j.ins.2024.120255

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Actor -critic deep reinforcement learning methods have demonstrated significant performance in many challenging decision -making and control tasks, but also suffer from high sample complexity and overestimation bias. Current researches focus on using underestimation to balance overestimation and reducing bias through ensemble learning, but introducing underestimation bias and excessive network costs. In this paper, we first analyze the effect of action selection policy on estimation bias. Then, we propose the Explorer -Actor -Critic (EAC) method that gives a more conservative objective for the actor to reduce overestimation, introduces a learnable explorer to improve exploration ability, and uses an action mixing mechanism to mitigate experience distribution bias. Furthermore, we apply the EAC method to TD3 and SAC and verify its effectiveness through extensive comparison and ablation experiments. Our algorithm not only outperforms state-of-the-art algorithms, but also is compatible with other actor -critic methods.

引用

页数：17

共 50 条

[1] A deep actor critic reinforcement learning framework for learning to rank
Padhye, Vaibhav
Lakshmanan, Kailasam
[J]. NEUROCOMPUTING, 2023, 547
[2] Integrated Actor-Critic for Deep Reinforcement Learning
Zheng, Jiaohao
Kurt, Mehmet Necip
Wang, Xiaodong
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
[3] Stochastic Integrated Actor-Critic for Deep Reinforcement Learning
Zheng, Jiaohao
Kurt, Mehmet Necip
Wang, Xiaodong
[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35 (05) : 6654 - 6666
[4] Visual Navigation with Actor-Critic Deep Reinforcement Learning
Shao, Kun
Zhao, Dongbin
Zhu, Yuanheng
Zhang, Qichao
[J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[5] Actor Critic Deep Reinforcement Learning for Neural Malware Control
Wang, Yu
Stokes, Jack W.
Marinescu, Mady
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 1005 - 1012
[6] Deep Actor-Critic Reinforcement Learning for Anomaly Detection
Zhong, Chen
Gursoy, M. Cenk
Velipasalar, Senem
[J]. 2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
[7] A Parallel Approach to Advantage Actor Critic in Deep Reinforcement Learning
Zhu, Xing
Du, Yunfei
[J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2019, PT II, 2020, 11945 : 320 - 327
[8] Averaged Soft Actor-Critic for Deep Reinforcement Learning
Ding, Feng
Ma, Guanfeng
Chen, Zhikui
Gao, Jing
Li, Peng
[J]. COMPLEXITY, 2021, 2021
[9] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
Zhong, Chen
Lu, Ziyang
Gursoy, M. Cenk
Velipasalar, Senem
[J]. 2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
[10] A Prioritized objective actor-critic method for deep reinforcement learning
Nguyen, Ngoc Duy
Nguyen, Thanh Thi
Vamplew, Peter
Dazeley, Richard
Nahavandi, Saeid
[J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349

← 1 2 3 4 5 →