Meta-Reinforcement Learning by Tracking Task Non-stationarity

被引:0
|
作者
Poiani, Riccardo [1 ]
Tirinzoni, Andrea [2 ]
Restelli, Marcello [1 ]
机构
[1] Politecn Milan, Milan, Italy
[2] Inria Lille, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world domains are subject to a structured non-stationarity which affects the agent's goals and the environmental dynamics. Meta-reinforcement learning (RL) has been shown successful for training agents that quickly adapt to related tasks. However, most of the existing meta-RL algorithms for non-stationary domains either make strong assumptions on the task generation process or require sampling from it at training time. In this paper, we propose a novel algorithm (TRIO) that optimizes for the future by explicitly tracking the task evolution through time. At training time, TRIO learns a variational module to quickly identify latent parameters from experience samples. This module is learned jointly with an optimal exploration policy that takes task uncertainty into account. At test time, TRIO tracks the evolution of the latent parameters online, hence reducing the uncertainty over future tasks and obtaining fast adaptation through the meta-learned policy. Unlike most existing methods, TRIO does not assume Markovian task-evolution processes, it does not require information about the non-stationarity at training time, and it captures complex changes undergoing in the environment. We evaluate our algorithm on different simulated problems and show it outperforms competitive baselines.
引用
收藏
页码:2899 / 2905
页数:7
相关论文
共 50 条
  • [31] Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity
    Mao, Weichao
    Qiu, Haoran
    Wang, Chen
    Franke, Hubertus
    Kalbarczyk, Zbigniew
    Iyer, Ravishankar K.
    Basar, Tamer
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [32] Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring
    Hussein, Maryem
    Keshk, Marwa
    Hussein, Aya
    ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II, 2024, 14472 : 350 - 362
  • [33] A Meta-Reinforcement Learning Algorithm for Causal Discovery
    Sauter, Andreas
    Acar, Erman
    Francois-Lavet, Vincent
    CONFERENCE ON CAUSAL LEARNING AND REASONING, VOL 213, 2023, 213 : 602 - 619
  • [34] Context meta-reinforcement learning via neuromodulation
    Ben-Iwhiwhu, Eseoghene
    Dick, Jeffery
    Ketz, Nicholas A.
    Pilly, Praveen K.
    Soltoggio, Andrea
    NEURAL NETWORKS, 2022, 152 : 70 - 79
  • [35] Formalising Performance Guarantees in Meta-Reinforcement Learning
    Mahony, Amanda
    FORMAL METHODS AND SOFTWARE ENGINEERING, ICFEM 2018, 2018, 11232 : 469 - 472
  • [36] Meta-reinforcement learning via orbitofrontal cortex
    Hattori, Ryoma
    Hedrick, Nathan G.
    Jain, Anant
    Chen, Shuqi
    You, Hanjia
    Hattori, Mariko
    Choi, Jun-Hyeok
    Lim, Byung Kook
    Yasuda, Ryohei
    Komiyama, Takaki
    NATURE NEUROSCIENCE, 2023, 26 (12) : 2182 - 2191
  • [37] Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning
    Wu, Zheng
    Xie, Yichen
    Lian, Wenzhao
    Wang, Changhao
    Guo, Yanjiang
    Chen, Jianyu
    Schaal, Stefan
    Tomizuka, Masayoshi
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7169 - 7175
  • [38] Off-Policy Meta-Reinforcement Learning With Belief-Based Task Inference
    Imagawa, Takahisa
    Hiraoka, Takuya
    Tsuruoka, Yoshimasa
    IEEE ACCESS, 2022, 10 : 49494 - 49507
  • [39] Prefrontal cortex as a meta-reinforcement learning system
    Wang, Jane X.
    Kurth-Nelson, Zeb
    Kumaran, Dharshan
    Tirumala, Dhruva
    Soyer, Hubert
    Leibo, Joel Z.
    Hassabis, Demis
    Botvinick, Matthew
    NATURE NEUROSCIENCE, 2018, 21 (06) : 860 - +
  • [40] Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems
    Amarildo Likmeta
    Alberto Maria Metelli
    Giorgia Ramponi
    Andrea Tirinzoni
    Matteo Giuliani
    Marcello Restelli
    Machine Learning, 2021, 110 : 2541 - 2576