Explorer-Actor-Critic: Better actors for deep reinforcement learning

被引:1
|
作者
Zhang, Junwei [1 ,2 ]
Han, Shuai [1 ,2 ,3 ]
Xiong, Xi [1 ,2 ]
Zhu, Sheng [1 ,4 ]
Lu, Shuai [1 ,2 ,4 ]
机构
[1] Jilin Univ, Key Lab Symbol Comp & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[3] Univ Utrecht, Dept Informat & Comp Sci, Utrecht, Netherlands
[4] Jilin Univ, Coll Software, Changchun 130012, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Deep reinforcement learning; Sample efficiency; Overestimation bias; Actor-critic framework; Exploration and exploitation;
D O I
10.1016/j.ins.2024.120255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Actor -critic deep reinforcement learning methods have demonstrated significant performance in many challenging decision -making and control tasks, but also suffer from high sample complexity and overestimation bias. Current researches focus on using underestimation to balance overestimation and reducing bias through ensemble learning, but introducing underestimation bias and excessive network costs. In this paper, we first analyze the effect of action selection policy on estimation bias. Then, we propose the Explorer -Actor -Critic (EAC) method that gives a more conservative objective for the actor to reduce overestimation, introduces a learnable explorer to improve exploration ability, and uses an action mixing mechanism to mitigate experience distribution bias. Furthermore, we apply the EAC method to TD3 and SAC and verify its effectiveness through extensive comparison and ablation experiments. Our algorithm not only outperforms state-of-the-art algorithms, but also is compatible with other actor -critic methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A deep actor critic reinforcement learning framework for learning to rank
    Padhye, Vaibhav
    Lakshmanan, Kailasam
    [J]. NEUROCOMPUTING, 2023, 547
  • [2] Integrated Actor-Critic for Deep Reinforcement Learning
    Zheng, Jiaohao
    Kurt, Mehmet Necip
    Wang, Xiaodong
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
  • [3] Stochastic Integrated Actor-Critic for Deep Reinforcement Learning
    Zheng, Jiaohao
    Kurt, Mehmet Necip
    Wang, Xiaodong
    [J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35 (05) : 6654 - 6666
  • [4] Visual Navigation with Actor-Critic Deep Reinforcement Learning
    Shao, Kun
    Zhao, Dongbin
    Zhu, Yuanheng
    Zhang, Qichao
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [5] Actor Critic Deep Reinforcement Learning for Neural Malware Control
    Wang, Yu
    Stokes, Jack W.
    Marinescu, Mady
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 1005 - 1012
  • [6] Deep Actor-Critic Reinforcement Learning for Anomaly Detection
    Zhong, Chen
    Gursoy, M. Cenk
    Velipasalar, Senem
    [J]. 2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [7] A Parallel Approach to Advantage Actor Critic in Deep Reinforcement Learning
    Zhu, Xing
    Du, Yunfei
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2019, PT II, 2020, 11945 : 320 - 327
  • [8] Averaged Soft Actor-Critic for Deep Reinforcement Learning
    Ding, Feng
    Ma, Guanfeng
    Chen, Zhikui
    Gao, Jing
    Li, Peng
    [J]. COMPLEXITY, 2021, 2021
  • [9] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
    Zhong, Chen
    Lu, Ziyang
    Gursoy, M. Cenk
    Velipasalar, Senem
    [J]. 2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
  • [10] A Prioritized objective actor-critic method for deep reinforcement learning
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Vamplew, Peter
    Dazeley, Richard
    Nahavandi, Saeid
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349