MAML2: meta reinforcement learning via meta-learning for task categories

被引：5

作者：

Fu, Qiming ^{[1
]}

Wang, Zhechao ^{[1
]}

Fang, Nengwei ^{[2
]}

Xing, Bin ^{[2
]}

Zhang, Xiao ^{[3
]}

Chen, Jianping ^{[4
]}

机构：

[1] Suzhou Univ Sci & Technol, Sch Elect & Informat Engn, Suzhou 215009, Peoples R China

[2] Chongqing Ind Big Data Innovat Ctr Co Ltd, Chongqing 404100, Peoples R China

[3] Xuzhou Med Univ, Sch Med Informat, Xuzhou 221004, Peoples R China

[4] Suzhou Univ Sci & Technol, Sch Architecture & Urban Planning, Suzhou 215009, Peoples R China

来源：

FRONTIERS OF COMPUTER SCIENCE | 2023年 / 17卷 / 04期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

meta-learning; reinforcement learning; few-shot learning; negative adaptation;

D O I：

10.1007/s11704-022-2037-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Meta-learning has been widely applied to solving few-shot reinforcement learning problems, where we hope to obtain an agent that can learn quickly in a new task. However, these algorithms often ignore some isolated tasks in pursuit of the average performance, which may result in negative adaptation in these isolated tasks, and they usually need sufficient learning in a stationary task distribution. In this paper, our algorithm presents a hierarchical framework of double meta-learning, and the whole framework includes classification, meta-learning, and re-adaptation. Firstly, in the classification process, we classify tasks into several task subsets, considered as some categories of tasks, by learned parameters of each task, which can separate out some isolated tasks thereafter. Secondly, in the meta-learning process, we learn category parameters in all subsets via meta-learning. Simultaneously, based on the gradient of each category parameter in each subset, we use meta-learning again to learn a new meta-parameter related to the whole task set, which can be used as an initial parameter for the new task. Finally, in the re-adaption process, we adapt the parameter of the new task with two steps, by the meta-parameter and the appropriate category parameter successively. Experimentally, we demonstrate our algorithm prevents the agent from negative adaptation without losing the average performance for the whole task set. Additionally, our algorithm presents a more rapid adaptation process within re-adaptation. Moreover, we show the good performance of our algorithm with fewer samples as the agent is exposed to an online meta-learning setting.

引用

页数：11

共 50 条

[1] Meta-learning in Reinforcement Learning
Schweighofer, N
Doya, K
NEURAL NETWORKS, 2003, 16 (01) : 5 - 9
[2] Multi-Task Reinforcement Meta-Learning in Neural Networks
Shakah, Ghazi
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 263 - 269
[3] Evo-MAML: Meta-Learning with Evolving Gradient
Chen, Jiaxing
Yuan, Weilin
Chen, Shaofei
Hu, Zhenzhen
Li, Peng
ELECTRONICS, 2023, 12 (18)
[4] ROBUST MAML: PRIORITIZATION TASK BUFFER WITH ADAPTIVE LEARNING PROCESS FOR MODEL-AGNOSTIC META-LEARNING
Thanh Nguyen
Tung Luu
Trung Pham
Rakhimkul, Sanzhar
Yoo, Chang D.
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3460 - 3464
[5] Improving Generalization in Meta-learning via Task Augmentation
Yao, Huaxiu
Huang, Long-Kai
Zhang, Linjun
Wei, Ying
Tian, Li
Zou, James
Huang, Junzhou
Li, Zhenhui
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[6] Towards Task Sampler Learning for Meta-Learning
Wang, Jingyao
Qiang, Wenwen
Su, Xingzhe
Zheng, Changwen
Sun, Fuchun
Xiong, Hui
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 5534 - 5564
[7] Meta-Learning for Multi-objective Reinforcement Learning
Chen, Xi
Ghadirzadeh, Ali
Bjorkman, Marten
Jensfelt, Patric
2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 977 - 983
[8] TASK2VEC: Task Embedding for Meta-Learning
Achille, Alessandro
Lam, Michael
Tewari, Rahul
Ravichandran, Avinash
Maji, Subhransu
Fowlkes, Charless
Soatto, Stefano
Perona, Pietro
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6439 - 6448
[9] Leveraging Task Variability in Meta-learning
Aimen A.
Ladrecha B.
Sidheekh S.
Krishnan N.C.
SN Computer Science, 4 (5)
[10] Meta-learning with an Adaptive Task Scheduler
Yao, Huaxiu
Wang, Yu
Wei, Ying
Zhao, Peilin
Mahdavi, Mehrdad
Lian, Defu
Finn, Chelsea
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →