Explorer-Actor-Critic: Better actors for deep reinforcement learning

被引:1
|
作者
Zhang, Junwei [1 ,2 ]
Han, Shuai [1 ,2 ,3 ]
Xiong, Xi [1 ,2 ]
Zhu, Sheng [1 ,4 ]
Lu, Shuai [1 ,2 ,4 ]
机构
[1] Jilin Univ, Key Lab Symbol Comp & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[3] Univ Utrecht, Dept Informat & Comp Sci, Utrecht, Netherlands
[4] Jilin Univ, Coll Software, Changchun 130012, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Deep reinforcement learning; Sample efficiency; Overestimation bias; Actor-critic framework; Exploration and exploitation;
D O I
10.1016/j.ins.2024.120255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Actor -critic deep reinforcement learning methods have demonstrated significant performance in many challenging decision -making and control tasks, but also suffer from high sample complexity and overestimation bias. Current researches focus on using underestimation to balance overestimation and reducing bias through ensemble learning, but introducing underestimation bias and excessive network costs. In this paper, we first analyze the effect of action selection policy on estimation bias. Then, we propose the Explorer -Actor -Critic (EAC) method that gives a more conservative objective for the actor to reduce overestimation, introduces a learnable explorer to improve exploration ability, and uses an action mixing mechanism to mitigate experience distribution bias. Furthermore, we apply the EAC method to TD3 and SAC and verify its effectiveness through extensive comparison and ablation experiments. Our algorithm not only outperforms state-of-the-art algorithms, but also is compatible with other actor -critic methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Symmetric actor-critic deep reinforcement learning for cascade quadrotor flight control
    Han, Haoran
    Cheng, Jian
    Xi, Zhilong
    Lv, Maolong
    [J]. NEUROCOMPUTING, 2023, 559
  • [22] Dynamic spectrum access and sharing through actor-critic deep reinforcement learning
    Liang Dong
    Yuchen Qian
    Yuan Xing
    [J]. EURASIP Journal on Wireless Communications and Networking, 2022
  • [23] Automatic collective motion tuning using actor-critic deep reinforcement learning
    Abpeikar, Shadi
    Kasmarik, Kathryn
    Garratt, Matthew
    Hunjet, Robert
    Khan, Md Mohiuddin
    Qiu, Huanneng
    [J]. SWARM AND EVOLUTIONARY COMPUTATION, 2022, 72
  • [24] Actor-Critic Deep Reinforcement Learning for Solving Job Shop Scheduling Problems
    Liu, Chien-Liang
    Chang, Chuan-Chin
    Tseng, Chun-Jan
    [J]. IEEE ACCESS, 2020, 8 : 71752 - 71762
  • [25] Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
    Lee, Alex X.
    Nagabandi, Anusha
    Abbeel, Pieter
    Levine, Sergey
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [26] Symmetric actor-critic deep reinforcement learning for cascade quadrotor flight control
    Han, Haoran
    Cheng, Jian
    Xi, Zhilong
    Lv, Maolong
    [J]. NEUROCOMPUTING, 2023, 559
  • [27] Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods
    Saglam, Baturay
    Duran, Enes
    Cicek, Dogan C.
    Mutlu, Furkan B.
    Kozat, Suleyman S.
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 137 - 144
  • [28] DAG-based workflows scheduling using Actor–Critic Deep Reinforcement Learning
    Koslovski, Guilherme Piêgas
    Pereira, Kleiton
    Albuquerque, Paulo Roberto
    [J]. Future Generation Computer Systems, 2024, 150 : 354 - 363
  • [29] Dynamic spectrum access and sharing through actor-critic deep reinforcement learning
    Dong, Liang
    Qian, Yuchen
    Xing, Yuan
    [J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2022, 2022 (01)
  • [30] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477