Offline-Online Actor-Critic

被引:1
|
作者
Wang X. [1 ]
Hou D. [1 ]
Huang L. [1 ]
Cheng Y. [1 ]
机构
[1] China University of Mining and Technology, Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education and School of Information and Control Engineering, Xuzhou
来源
关键词
Actor-critic; behavior clone (BC) constraint; distribution shift; offline-online reinforcement learning (RL); policy performance degradation;
D O I
10.1109/TAI.2022.3225251
中图分类号
学科分类号
摘要
Offline-online reinforcement learning (RL) can effectively address the problem of missing data (commonly known as transition) in offline RL. However, due to the effect of distribution shift, the performance of policy may degrade when an agent moves from offline to online training phases. In this article, we first analyze the problems of distribution shift and policy performance degradation in offline-online RL. Then, in order to alleviate these problems, we propose a novel RL algorithm offline-online actor-critic (O2AC) algorithm. In O2AC, a behavior clone constraint term is introduced into the policy objective function to address the distribution shift in offline training phase. In addition, in online training phase, the influence of the behavior clone constraint term is gradually reduced, which alleviates the policy performance degradation. Experiments show that O2AC outperforms existing offline-online RL algorithms. © 2020 IEEE.
引用
收藏
页码:61 / 69
页数:8
相关论文
共 50 条
  • [1] Generalized Offline Actor-Critic with Behavior Regularization
    Cheng Y.-H.
    Huang L.-Y.
    Hou D.-Y.
    Zhang J.-Z.
    Chen J.-L.
    Wang X.-S.
    Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (04): : 843 - 855
  • [2] Offline Deterministic Actor-Critic Based on Uncertainty Estimation
    Feng H.-T.
    Cheng Y.-H.
    Wang X.-S.
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (04): : 717 - 732
  • [3] Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
    Wu, Yue
    Zhai, Shuangfei
    Srivastava, Nitish
    Susskind, Joshua
    Zhang, Jian
    Salakhutdinov, Ruslan
    Goh, Hanlin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Dual Behavior Regularized Offline Deterministic Actor-Critic
    Cao, Shuo
    Wang, Xuesong
    Cheng, Yuhu
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (08): : 4841 - 4852
  • [5] Actor-Critic Algorithms with Online Feature Adaptation
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    Borkar, Vivek S.
    ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2016, 26 (04):
  • [6] Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
    Zanette, Andrea
    Wainwright, Martin J.
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Robust Offline Actor-Critic with On-Policy Regularized Policy Evaluation
    Cao, Shuo
    Wang, Xuesong
    Cheng, Yuhu
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (12) : 2497 - 2511
  • [8] Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
    Zhu, Hanlin
    Rashidinejad, Paria
    Jiao, Jiantao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
  • [10] Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation
    Shuo Cao
    Xuesong Wang
    Yuhu Cheng
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (12) : 2497 - 2511