Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference

被引：0

作者：

Chen, Jinhao ^{[1
]}

Zhang, Chunhong ^{[2
]}

Hu, Zheng ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100088, Peoples R China

[2] Beijing Univ Posts & Telecommun, Key Lab Universal Wireless Commun, Minist Educ, Beijing 100088, Peoples R China

来源：

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024 | 2024年 / 14647卷

关键词：

Meta-Reinforcement Learning; Variational Inference; Hidden Feature;

D O I：

10.1007/978-981-97-2259-4_17

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Meta-Reinforcement Learning aims to rapidly address unseen tasks that share similar structures. However, the agent heavily relies on a large amount of experience during the meta-training phase, presenting a formidable challenge in achieving high sample efficiency. Current methods typically adapt to novel tasks within the Meta-Reinforcement Learning framework through task inference. Unfortunately, these approaches still exhibit limitations when faced with highcomplexity task space. In this paper, we propose a Meta-Reinforcement Learning method based on reward and dynamic inference. We introduce independent reward and dynamic inference encoders, which sample specific context information to capture the deep-level features of task goals and dynamics. By reducing task inference space, agent effectively learns the shared structures across tasks and acquires a profound understanding of the task differences. We illustrate the performance degradation caused by the high task inference complexity and demonstrate that our method outperforms previous algorithms in terms of sample efficiency.

引用

页码：223 / 234

页数：12

共 50 条

[1] A Federated Meta-Reinforcement Learning Algorithm Based on Gradient Correction
Qin, Zerui
Yue, Sheng
PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024, 2024, : 220 - 221
[2] A Meta-Reinforcement Learning Algorithm for Causal Discovery
Sauter, Andreas
Acar, Erman
Francois-Lavet, Vincent
CONFERENCE ON CAUSAL LEARNING AND REASONING, VOL 213, 2023, 213 : 602 - 619
[3] Meta-Reinforcement Learning With Dynamic Adaptiveness Distillation
Hu, Hangkai
Huang, Gao
Li, Xiang
Song, Shiji
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1454 - 1464
[4] Dynamic Channel Access via Meta-Reinforcement Learning
Lu, Ziyang
Gursoy, M. Cenk
2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
[5] Off-Policy Meta-Reinforcement Learning With Belief-Based Task Inference
Imagawa, Takahisa
Hiraoka, Takuya
Tsuruoka, Yoshimasa
IEEE ACCESS, 2022, 10 : 49494 - 49507
[6] Hypernetworks in Meta-Reinforcement Learning
Beck, Jacob
Jackson, Matthew
Vuorio, Risto
Whiteson, Shimon
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1478 - 1487
[7] Meta-Reinforcement Learning in Non-Stationary and Dynamic Environments
Bing, Zhenshan
Lerch, David
Huang, Kai
Knoll, Alois
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3476 - 3491
[8] Model-based Adversarial Meta-Reinforcement Learning
Lin, Zichuan
Thomas, Garrett
Yang, Guangwen
Ma, Tengyu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[9] Model-Based Meta-reinforcement Learning for Hyperparameter Optimization
Albrechts, Jeroen
Martin, Hugo M.
Tavakol, Maryam
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT I, 2025, 15346 : 27 - 39
[10] Meta-Reinforcement Learning Based Resource Allocation for Dynamic V2X Communications
Yuan, Yi
Zheng, Gan
Wong, Kai-Kit
Letaief, Khaled B.
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (09) : 8964 - 8977

← 1 2 3 4 5 →