Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

被引:0
|
作者
Zhou, Wei [1 ]
Li, Yiying [1 ]
Yang, Yongxin [2 ]
Wang, Huaimin [1 ]
Hospedales, Timothy M. [2 ,3 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] Univ Edinburgh, Sch Informat, Edinburgh, Scotland
[3] Samsung AI Ctr, Cambridge, England
基金
英国工程与自然科学研究理事会; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Off-Policy Actor-Critic (OffP-AC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a flexible meta-critic framework based on observing the learning process and meta-learning an additional loss for the actor that accelerates and improves actor-critic learning. Compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning benefits to a variety of continuous control tasks when combined with contemporary OffP-AC methods DDPG, TD3 and SAC.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Meta attention for Off-Policy Actor-Critic
    Huang, Jiateng
    Huang, Wanrong
    Lan, Long
    Wu, Dan
    [J]. NEURAL NETWORKS, 2023, 163 : 86 - 96
  • [2] Generalized Off-Policy Actor-Critic
    Zhang, Shangtong
    Boehmer, Wendelin
    Whiteson, Shimon
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Off-Policy Actor-critic for Recommender Systems
    Chen, Minmin
    Xu, Can
    Gatto, Vince
    Jain, Devanshu
    Kumar, Aviral
    Chi, Ed
    [J]. PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
  • [4] Off-Policy Actor-Critic with Emphatic Weightings
    Graves, Eric
    Imani, Ehsan
    Kumaraswamy, Raksha
    White, Martha
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [5] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
    Tasfi, Norman
    Capretz, Miriam
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
    Zhang, Yan
    Zavlanos, Michael M.
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
  • [7] Variance Penalized On-Policy and Off-Policy Actor-Critic
    Jain, Arushi
    Patil, Gandharv
    Jain, Ayush
    Khetarpa, Khimya
    Precup, Doina
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7899 - 7907
  • [8] Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
    Xu, Tengyu
    Yang, Zhuoran
    Wang, Zhaoran
    Liang, Yingbin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
    Haarnoja, Tuomas
    Zhou, Aurick
    Abbeel, Pieter
    Levine, Sergey
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [10] Off-policy actor-critic deep reinforcement learning methods for alert prioritization in intrusion detection systems
    Chavali, Lalitha
    Krishnan, Abhinav
    Saxena, Paresh
    Mitra, Barsha
    Chivukula, Aneesh Sreevallabh
    [J]. COMPUTERS & SECURITY, 2024, 142