A Novel Adaptive Sampling Strategy for Deep Reinforcement Learning

被引：1

作者：

Liang, Xingxing ^{[1
]}

Chen, Li ^{[1
]}

Feng, Yanghe ^{[1
]}

Liu, Zhong ^{[1
]}

Ma, Yang ^{[1
]}

Huang, Kuihua ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS | 2021年 / 20卷 / 02期

关键词：

Deep reinforcement learning; an adaptive factor; DQN; Actor-Critic (AC) algorithm; GAME; GO;

D O I：

10.1142/S1469026821500115

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning, as an effective method to solve complex sequential decision-making problems, plays an important role in areas such as intelligent decision-making and behavioral cognition. It is well known that the sample experience replay mechanism contributes to the development of current deep reinforcement learning by reusing past samples to improve the efficiency of samples. However, the existing priority experience replay mechanism changes the sample distribution in the sample set due to the higher sampling frequency assigned to a specific transition, and it cannot be applied to actor-critic and other on-policy reinforcement learning algorithm. To address this, we propose an adaptive factor based on TD-error, which further increases sample utilization by giving more attention weight to samples of larger TD-error, and embeds it flexibly into the original Deep Q Network and Advantage Actor-Critic algorithm to improve their performance. Then we carried out the performance evaluation for the proposed architecture in the context of CartPole-V1 and 6 environments of Atari game experiments, respectively, and the obtained results either on the conditions of fixed temperature or annealing temperature, when compared to those produced by the vanilla DQN and original A2C, highlight the advantages in cumulative rewards and climb speed of the improved algorithms.

引用

页数：20

共 50 条

[31] Collaborative Optimization of Energy Management Strategy and Adaptive Cruise Control Based on Deep Reinforcement Learning
Peng, Jiankun
Fan, Yi
Yin, Guodong
Jiang, Ruhai
[J]. IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2023, 9 (01) : 34 - 44
[32] Learning a Diagnostic Strategy on Medical Data With Deep Reinforcement Learning
Zhu, Mengxiao
Zhu, Haogang
[J]. IEEE ACCESS, 2021, 9 : 84122 - 84133
[33] A Penetration Strategy Combining Deep Reinforcement Learning and Imitation Learning
Wang, Xiaofang
Gu, Kunren
[J]. Yuhang Xuebao/Journal of Astronautics, 2023, 44 (06): : 914 - 925
[34] A Novel Trading Strategy Framework Based on Reinforcement Deep Learning for Financial Market Predictions
Cheng, Li-Chen
Huang, Yu-Hsiang
Hsieh, Ming-Hua
Wu, Mu-En
[J]. MATHEMATICS, 2021, 9 (23)
[35] Adaptive evolution strategy with ensemble of mutations for Reinforcement Learning
Ajani, Oladayo S.
Mallipeddi, Rammohan
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 245
[36] Adaptive evolution strategy with ensemble of mutations for Reinforcement Learning
Ajani, Oladayo S.
Mallipeddi, Rammohan
[J]. Knowledge-Based Systems, 2022, 245
[37] Variable Sampling Period Adaptive Control Based on Reinforcement Learning
Lemos, Joao M.
Parente, Francisco
Cunha, Rita
[J]. CONTROLO 2022, 2022, 930 : 577 - 586
[38] MARLAS: Multi Agent Reinforcement Learning for Cooperated Adaptive Sampling
Pan, Lishuo
Manjanna, Sandeep
Hsieh, M. Ani
[J]. DISTRIBUTED AUTONOMOUS ROBOTIC SYSTEMS, DARS 2022, 2024, 28 : 347 - 362
[39] Optimistic Sampling Strategy for Data-Efficient Reinforcement Learning
Zhao, Dongfang
Liu, Jiafeng
Wu, Rui
Cheng, Dansong
Tang, Xianglong
[J]. IEEE ACCESS, 2019, 7 : 55763 - 55769
[40] Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization
Li, Xiaodong
Wu, Pangjing
Zou, Chenxin
Li, Qing
[J]. IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (03) : 288 - 300

← 1 2 3 4 5 →