Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm

被引:1
|
作者
Jiang, Haobo [1 ]
Qian, Jianjun [1 ]
Xie, Jin [1 ]
Yang, Jian [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ, Nanjing 210094, Peoples R China
关键词
Off-policy actor-critic policy gradient; Tree-backup algorithm; All-action method; Episode-experience replay;
D O I
10.1007/978-3-030-03398-9_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Off-policy algorithms have played important roles in deep reinforcement learning. Since the off-policy based policy gradient is a biased estimation, the previous works employed importance sampling to achieve the unbiased estimation, where the behavior policy is known in advance. However, it is difficult to choose the reasonable behavior policy for complex agents. Moreover, importance sampling usually produces the large variance. To address these problems, this paper presents a novel actor-critic policy gradient algorithm. Specifically, we employ the tree-backup method in off-policy setting to achieve the unbiased estimation of target policy gradient without using importance sampling. Meanwhile, we combine the naive episode-experience replay and the experience replay to obtain the trajectory samples and reduce the strong correlations between these samples. The experimental results demonstrate the advantages of the proposed method over the competed methods.
引用
收藏
页码:562 / 573
页数:12
相关论文
共 46 条
  • [41] Acquiring of walking behavior for four-legged robots using actor-critic method based on policy gradient
    Inoue, R.
    Watanabe, K.
    Igarashi, H.
    2010 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, 2010, : 795 - 800
  • [42] Control of nonholonomic mobile robot by an adaptive actor-critic method with simulated experience based value-functions
    Syam, R
    Watanabe, K
    Izumi, K
    Kiguchi, K
    2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 3960 - 3965
  • [43] A Vehicle Path Planning Algorithm Based on Mixed Policy Gradient Actor-Critic Model with Random Escape Term and Filter Optimization
    Nai, Wei
    Yang, Zan
    Lin, Daxuan
    Li, Dan
    Xing, Yidan
    JOURNAL OF MATHEMATICS, 2022, 2022
  • [44] Deep reinforcement learning-based model-free path planning and collision avoidance for UAVs: A soft actor-critic with hindsight experience replay approach
    Lee, Myoung Hoon
    Moon, Jun
    ICT EXPRESS, 2023, 9 (03): : 403 - 408
  • [45] Autonomous Navigation Decision-Making Method for a Smart Marine Surface Vessel Based on an Improved Soft Actor-Critic Algorithm
    Cui, Zhewen
    Guan, Wei
    Zhang, Xianku
    Zhang, Cheng
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (08)
  • [46] Informer-embedded prioritized experience replay-based soft actor-critic for ultra-low frequency oscillation suppression of pumped hydropower storage systems
    Yin, Linfei
    Huang, Wenxuan
    JOURNAL OF ENERGY STORAGE, 2025, 114