Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm

被引：1

作者：

Jiang, Haobo ^{[1
]}

Qian, Jianjun ^{[1
]}

Xie, Jin ^{[1
]}

Yang, Jian ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ, Nanjing 210094, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I | 2018年 / 11256卷

关键词：

Off-policy actor-critic policy gradient; Tree-backup algorithm; All-action method; Episode-experience replay;

D O I：

10.1007/978-3-030-03398-9_48

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Off-policy algorithms have played important roles in deep reinforcement learning. Since the off-policy based policy gradient is a biased estimation, the previous works employed importance sampling to achieve the unbiased estimation, where the behavior policy is known in advance. However, it is difficult to choose the reasonable behavior policy for complex agents. Moreover, importance sampling usually produces the large variance. To address these problems, this paper presents a novel actor-critic policy gradient algorithm. Specifically, we employ the tree-backup method in off-policy setting to achieve the unbiased estimation of target policy gradient without using importance sampling. Meanwhile, we combine the naive episode-experience replay and the experience replay to obtain the trajectory samples and reduce the strong correlations between these samples. The experimental results demonstrate the advantages of the proposed method over the competed methods.

引用

页码：562 / 573

页数：12

共 46 条

[41] Acquiring of walking behavior for four-legged robots using actor-critic method based on policy gradient
Inoue, R.
Watanabe, K.
Igarashi, H.
2010 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, 2010, : 795 - 800
[42] Control of nonholonomic mobile robot by an adaptive actor-critic method with simulated experience based value-functions
Syam, R
Watanabe, K
Izumi, K
Kiguchi, K
2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 3960 - 3965
[43] A Vehicle Path Planning Algorithm Based on Mixed Policy Gradient Actor-Critic Model with Random Escape Term and Filter Optimization
Nai, Wei
Yang, Zan
Lin, Daxuan
Li, Dan
Xing, Yidan
JOURNAL OF MATHEMATICS, 2022, 2022
[44] Deep reinforcement learning-based model-free path planning and collision avoidance for UAVs: A soft actor-critic with hindsight experience replay approach
Lee, Myoung Hoon
Moon, Jun
ICT EXPRESS, 2023, 9 (03): : 403 - 408
[45] Autonomous Navigation Decision-Making Method for a Smart Marine Surface Vessel Based on an Improved Soft Actor-Critic Algorithm
Cui, Zhewen
Guan, Wei
Zhang, Xianku
Zhang, Cheng
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (08)
[46] Informer-embedded prioritized experience replay-based soft actor-critic for ultra-low frequency oscillation suppression of pumped hydropower storage systems
Yin, Linfei
Huang, Wenxuan
JOURNAL OF ENERGY STORAGE, 2025, 114

← 1 2 3 4 5 →