Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

被引:0
|
作者
Zhou, Wei [1 ]
Li, Yiying [1 ]
Yang, Yongxin [2 ]
Wang, Huaimin [1 ]
Hospedales, Timothy M. [2 ,3 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] Univ Edinburgh, Sch Informat, Edinburgh, Scotland
[3] Samsung AI Ctr, Cambridge, England
基金
英国工程与自然科学研究理事会; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Off-Policy Actor-Critic (OffP-AC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a flexible meta-critic framework based on observing the learning process and meta-learning an additional loss for the actor that accelerates and improves actor-critic learning. Compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning benefits to a variety of continuous control tasks when combined with contemporary OffP-AC methods DDPG, TD3 and SAC.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Meta attention for Off-Policy Actor-Critic
    Huang, Jiateng
    Huang, Wanrong
    Lan, Long
    Wu, Dan
    NEURAL NETWORKS, 2023, 163 : 86 - 96
  • [2] Generalized Off-Policy Actor-Critic
    Zhang, Shangtong
    Boehmer, Wendelin
    Whiteson, Shimon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Off-Policy Actor-critic for Recommender Systems
    Chen, Minmin
    Xu, Can
    Gatto, Vince
    Jain, Devanshu
    Kumar, Aviral
    Chi, Ed
    PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
  • [4] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
    Tasfi, Norman
    Capretz, Miriam
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [5] Off-Policy Actor-Critic with Emphatic Weightings
    Graves, Eric
    Imani, Ehsan
    Kumaraswamy, Raksha
    White, Martha
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [6] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
    Zhang, Yan
    Zavlanos, Michael M.
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
  • [7] An efficient and lightweight off-policy actor-critic reinforcement learning framework
    Zhang, Huaqing
    Ma, Hongbin
    Zhang, Xiaofei
    Mersha, Bemnet Wondimagegnehu
    Wang, Li
    Jin, Ying
    APPLIED SOFT COMPUTING, 2024, 163
  • [8] Variance Penalized On-Policy and Off-Policy Actor-Critic
    Jain, Arushi
    Patil, Gandharv
    Jain, Ayush
    Khetarpa, Khimya
    Precup, Doina
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7899 - 7907
  • [9] Relative importance sampling for off-policy actor-critic in deep reinforcement learning
    Mahammad Humayoo
    Gengzhong Zheng
    Xiaoqing Dong
    Liming Miao
    Shuwei Qiu
    Zexun Zhou
    Peitao Wang
    Zakir Ullah
    Naveed Ur Rehman Junejo
    Xueqi Cheng
    Scientific Reports, 15 (1)
  • [10] Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
    Xu, Tengyu
    Yang, Zhuoran
    Wang, Zhaoran
    Liang, Yingbin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139