Duplicated Replay Buffer for Asynchronous Deep Deterministic Policy Gradient

被引：1

作者：

Motehayeri, Seyed Mohammad Seyed ^{[1
]}

Baghi, Vahid ^{[1
]}

Miandoab, Ehsan Maani ^{[1
]}

Moeini, Ali ^{[1
]}

机构：

[1] Univ Tehran, Dept Algorithms & Computat, Tehran, Iran

来源：

2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC) | 2021年

关键词：

deep reinforcement learning; experience replay buffer; deep deterministic policy gradient; asynchronous episodic deep deterministic policy gradient;

D O I：

10.1109/CSICC52343.2021.9420550

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Off-Policy Deep Reinforcement Learning (DRL) algorithms such as Deep Deterministic Policy Gradient (DDPG) has been used to teach intelligent agents to solve complicated problems in continuous space-action environments. Several methods have been successfully applied to increase the training performance and achieve better speed and stability for these algorithms. Such as experience replay to selecting a batch of transactions of the replay memory buffer. However, working with environments with sparse reward function is a challenge for these algorithms and causes them to reduce these algorithms' performance. This research intends to make the transaction selection process more efficient by increasing the likelihood of selecting important transactions from the replay memory buffer. Our proposed method works better with a sparse reward function or, in particular, with environments that have termination conditions. We are using a secondary replay memory buffer that stores more critical transactions. In the training process, transactions are select in both the first replay buffer and the secondary replay buffer. We also use a parallel environment to asynchronously execute and fill the primary replay buffer and the secondary replay buffer. This method will help us to get better performance and stability. Finally, we evaluate our proposed approach to the Crawler model, one of the Unity ML-Agent tasks with sparse reward function, against DDPG and AE-DDPG.

引用

页数：6

共 50 条

[1] Deep Deterministic Policy Gradient With Classified Experience Replay
Shi S.-M.
Liu Q.
Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (07): : 1816 - 1823
[2] Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay
Kang, Chaohai
Rong, Chuiting
Ren, Weijian
Huo, Fengcai
Liu, Pengyun
IEEE ACCESS, 2021, 9 : 60296 - 60308
[3] Asynchronous Methods for Multi-agent Deep Deterministic Policy Gradient
Jiang, Xuesong
Li, Zhipeng
Wei, Xiumei
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 711 - 721
[4] Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay
Sun, Xiaoying
Chen, Jinchao
Du, Chenglie
Zhan, Mengying
2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 988 - 992
[5] Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay
Cicek, Dogan C.
Duran, Enes
Saglam, Baturay
Mutlu, Furkan B.
Kozat, Suleyman S.
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1255 - 1262
[6] Deep deterministic policy gradient algorithm based on dung beetle optimization and priority experience replay mechanism
Hengwei Zhu
Chuiting Rong
Haorui Liu
Scientific Reports, 15 (1)
[7] Efficient experience replay based deep deterministic policy gradient for AGC dispatch in integrated energy system
Li, Jiawen
Yu, Tao
Zhang, Xiaoshun
Li, Fusheng
Lin, Dan
Zhu, Hanxin
APPLIED ENERGY, 2021, 285
[8] MP-TD3: Multi-Pool Prioritized Experience Replay-Based Asynchronous Twin Delayed Deep Deterministic Policy Gradient Algorithm
Tan, Wenwen
Huang, Detian
IEEE ACCESS, 2024, 12 : 105268 - 105280
[9] Asynchronous Episodic Deep Deterministic Policy Gradient: Toward Continuous Control in Computationally Complex Environments
Zhang, Zhizheng
Chen, Jiale
Chen, Zhibo
Li, Weiping
IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (02) : 604 - 613
[10] Policy Space Noise in Deep Deterministic Policy Gradient
Yan, Yan
Liu, Quan
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 624 - 634

← 1 2 3 4 5 →