Duplicated Replay Buffer for Asynchronous Deep Deterministic Policy Gradient

被引:1
|
作者
Motehayeri, Seyed Mohammad Seyed [1 ]
Baghi, Vahid [1 ]
Miandoab, Ehsan Maani [1 ]
Moeini, Ali [1 ]
机构
[1] Univ Tehran, Dept Algorithms & Computat, Tehran, Iran
来源
2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC) | 2021年
关键词
deep reinforcement learning; experience replay buffer; deep deterministic policy gradient; asynchronous episodic deep deterministic policy gradient;
D O I
10.1109/CSICC52343.2021.9420550
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Off-Policy Deep Reinforcement Learning (DRL) algorithms such as Deep Deterministic Policy Gradient (DDPG) has been used to teach intelligent agents to solve complicated problems in continuous space-action environments. Several methods have been successfully applied to increase the training performance and achieve better speed and stability for these algorithms. Such as experience replay to selecting a batch of transactions of the replay memory buffer. However, working with environments with sparse reward function is a challenge for these algorithms and causes them to reduce these algorithms' performance. This research intends to make the transaction selection process more efficient by increasing the likelihood of selecting important transactions from the replay memory buffer. Our proposed method works better with a sparse reward function or, in particular, with environments that have termination conditions. We are using a secondary replay memory buffer that stores more critical transactions. In the training process, transactions are select in both the first replay buffer and the secondary replay buffer. We also use a parallel environment to asynchronously execute and fill the primary replay buffer and the secondary replay buffer. This method will help us to get better performance and stability. Finally, we evaluate our proposed approach to the Crawler model, one of the Unity ML-Agent tasks with sparse reward function, against DDPG and AE-DDPG.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Deep Deterministic Policy Gradient With Classified Experience Replay
    Shi S.-M.
    Liu Q.
    Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (07): : 1816 - 1823
  • [2] Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay
    Kang, Chaohai
    Rong, Chuiting
    Ren, Weijian
    Huo, Fengcai
    Liu, Pengyun
    IEEE ACCESS, 2021, 9 : 60296 - 60308
  • [3] Asynchronous Methods for Multi-agent Deep Deterministic Policy Gradient
    Jiang, Xuesong
    Li, Zhipeng
    Wei, Xiumei
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 711 - 721
  • [4] Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay
    Sun, Xiaoying
    Chen, Jinchao
    Du, Chenglie
    Zhan, Mengying
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 988 - 992
  • [5] Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay
    Cicek, Dogan C.
    Duran, Enes
    Saglam, Baturay
    Mutlu, Furkan B.
    Kozat, Suleyman S.
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1255 - 1262
  • [6] Deep deterministic policy gradient algorithm based on dung beetle optimization and priority experience replay mechanism
    Hengwei Zhu
    Chuiting Rong
    Haorui Liu
    Scientific Reports, 15 (1)
  • [7] Efficient experience replay based deep deterministic policy gradient for AGC dispatch in integrated energy system
    Li, Jiawen
    Yu, Tao
    Zhang, Xiaoshun
    Li, Fusheng
    Lin, Dan
    Zhu, Hanxin
    APPLIED ENERGY, 2021, 285
  • [8] MP-TD3: Multi-Pool Prioritized Experience Replay-Based Asynchronous Twin Delayed Deep Deterministic Policy Gradient Algorithm
    Tan, Wenwen
    Huang, Detian
    IEEE ACCESS, 2024, 12 : 105268 - 105280
  • [9] Asynchronous Episodic Deep Deterministic Policy Gradient: Toward Continuous Control in Computationally Complex Environments
    Zhang, Zhizheng
    Chen, Jiale
    Chen, Zhibo
    Li, Weiping
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (02) : 604 - 613
  • [10] Policy Space Noise in Deep Deterministic Policy Gradient
    Yan, Yan
    Liu, Quan
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 624 - 634