Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

被引：0

作者：

Zhou, Wei ^{[1
]}

Li, Yiying ^{[1
]}

Yang, Yongxin ^{[2
]}

Wang, Huaimin ^{[1
]}

Hospedales, Timothy M. ^{[2
,3
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China

[2] Univ Edinburgh, Sch Informat, Edinburgh, Scotland

[3] Samsung AI Ctr, Cambridge, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

基金：

英国工程与自然科学研究理事会; 中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Off-Policy Actor-Critic (OffP-AC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a flexible meta-critic framework based on observing the learning process and meta-learning an additional loss for the actor that accelerates and improves actor-critic learning. Compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning benefits to a variety of continuous control tasks when combined with contemporary OffP-AC methods DDPG, TD3 and SAC.

引用

页数：12

共 50 条

[1] Meta attention for Off-Policy Actor-Critic
Huang, Jiateng
Huang, Wanrong
Lan, Long
Wu, Dan
NEURAL NETWORKS, 2023, 163 : 86 - 96
[2] Generalized Off-Policy Actor-Critic
Zhang, Shangtong
Boehmer, Wendelin
Whiteson, Shimon
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Off-Policy Actor-critic for Recommender Systems
Chen, Minmin
Xu, Can
Gatto, Vince
Jain, Devanshu
Kumar, Aviral
Chi, Ed
PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
[4] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
Tasfi, Norman
Capretz, Miriam
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[5] Off-Policy Actor-Critic with Emphatic Weightings
Graves, Eric
Imani, Ehsan
Kumaraswamy, Raksha
White, Martha
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[6] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Zhang, Yan
Zavlanos, Michael M.
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
[7] An efficient and lightweight off-policy actor-critic reinforcement learning framework
Zhang, Huaqing
Ma, Hongbin
Zhang, Xiaofei
Mersha, Bemnet Wondimagegnehu
Wang, Li
Jin, Ying
APPLIED SOFT COMPUTING, 2024, 163
[8] Variance Penalized On-Policy and Off-Policy Actor-Critic
Jain, Arushi
Patil, Gandharv
Jain, Ayush
Khetarpa, Khimya
Precup, Doina
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7899 - 7907
[9] Relative importance sampling for off-policy actor-critic in deep reinforcement learning
Mahammad Humayoo
Gengzhong Zheng
Xiaoqing Dong
Liming Miao
Shuwei Qiu
Zexun Zhou
Peitao Wang
Zakir Ullah
Naveed Ur Rehman Junejo
Xueqi Cheng
Scientific Reports, 15 (1)
[10] Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Xu, Tengyu
Yang, Zhuoran
Wang, Zhaoran
Liang, Yingbin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →