Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments

被引:8
|
作者
Bing, Zhenshan [1 ]
Knak, Lukas [1 ]
Cheng, Long [2 ]
Morin, Fabrice O. [1 ]
Huang, Kai [3 ]
Knoll, Alois [1 ]
机构
[1] Tech Univ Munich, Dept Informat, D-85748 Munich, Germany
[2] Wenzhou Univ, Coll Comp Sci & Artificial Intelligence, Wenzhou 325035, Peoples R China
[3] Sun Yat sen Univ, Sch Data & Comp Sci, Guangzhou 543000, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Training; Adaptation models; Robots; Probabilistic logic; Turning; Switches; Gaussian variational autoencoder (VAE); meta-reinforcement learning (meta-RL); robotic control; task adaptation; task inference;
D O I
10.1109/TNNLS.2023.3270298
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent state-of-the-art artificial agents lack the ability to adapt rapidly to new tasks, as they are trained exclusively for specific objectives and require massive amounts of interaction to learn new skills. Meta-reinforcement learning (meta-RL) addresses this challenge by leveraging knowledge learned from training tasks to perform well in previously unseen tasks. However, current meta-RL approaches limit themselves to narrow parametric and stationary task distributions, ignoring qualitative differences and nonstationary changes between tasks that occur in the real world. In this article, we introduce a Task-Inference-based meta-RL algorithm using explicitly parameterized Gaussian variational autoencoders (VAEs) and gated Recurrent units (TIGR), designed for nonparametric and nonstationary environments. We employ a generative model involving a VAE to capture the multimodality of the tasks. We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective. We establish a zero-shot adaptation procedure to enable the agent to adapt to nonstationary task changes. We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared with state-of-the-art meta-RL approaches in terms of sample efficiency (three to ten times faster), asymptotic performance, and applicability in nonparametric and nonstationary environments with zero-shot adaptation. Videos can be viewed at https://videoviewsite.wixsite.com/tigr.
引用
收藏
页码:13604 / 13618
页数:15
相关论文
共 50 条
  • [21] Model-based Adversarial Meta-Reinforcement Learning
    Lin, Zichuan
    Thomas, Garrett
    Yang, Guangwen
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [22] Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
    Schoettler, Gerrit
    Nair, Ashvin
    Ojea, Juan Aparicio
    Levine, Sergey
    Solowjow, Eugen
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 9728 - 9735
  • [23] Meta-reinforcement learning for edge caching in vehicular networks
    Sakr H.
    Elsabrouty M.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (04) : 4607 - 4619
  • [24] Doubly Robust Augmented Transfer for Meta-Reinforcement Learning
    Jiang, Yuankun
    Kan, Nuowen
    Li, Chenglin
    Dai, Wenrui
    Zou, Junni
    Xiong, Hongkai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [25] Wireless Power Control via Meta-Reinforcement Learning
    Lu, Ziyang
    Gursoy, M. Cenk
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 1562 - 1567
  • [26] Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning
    Beyene, Sofanit Wubeshet
    Han, Ji-Hyeong
    ELECTRONICS, 2022, 11 (24)
  • [27] PAC-Bayesian offline Meta-reinforcement learning
    Sun, Zheng
    Jing, Chenheng
    Guo, Shangqi
    An, Lingling
    APPLIED INTELLIGENCE, 2023, 53 (22) : 27128 - 27147
  • [28] Meta-Reinforcement Learning for Multiple Traffic Signals Control
    Lou, Yican
    Wu, Jia
    Ran, Yunchuan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4264 - 4268
  • [29] Dynamic Channel Access via Meta-Reinforcement Learning
    Lu, Ziyang
    Gursoy, M. Cenk
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [30] Meta-Reinforcement Learning via Exploratory Task Clustering
    Chu, Zhendong
    Cai, Renqin
    Wang, Hongning
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 11633 - 11641