Multitask reinforcement learning on the distribution of MDPs

被引:0
|
作者
Tanaka, F [1 ]
Yamamura, M [1 ]
机构
[1] Tokyo Inst Technol, Dept Computat Intelligence & Syst Sci, Tokyo 152, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we address a new problem in reinforcement learning. Here we consider an agent that faces multiple learning tasks within its lifetime. The agent's objective is to maximize its total reward in the lifetime as well as a conventional return in each task. To realize this, it has to be endowed an important ability to keep its past learning experiences and utilize them for improving future learning performance. This time we try to phrase this problem formally. The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs. As an approach to exploiting past learning experiences, we focus on statistics (mean and deviation) about the agent's value tables. The mean can be used as initial values of the table when a new task is presented. The deviation can be viewed as measuring reliability of the mean, and we utilize it in calculating priority of simulated backups. We conduct experiments in computer simulation to evaluate the effectiveness.
引用
收藏
页码:1108 / 1113
页数:6
相关论文
共 50 条
  • [31] Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
    Papini, Matteo
    Tirinzoni, Andrea
    Pacchiano, Aldo
    Restilli, Marcello
    Lazaric, Alessandro
    Pirotta, Matteo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [32] Efficient Multitask Reinforcement Learning Without Performance Loss
    Baek, Jongchan
    Baek, Seungmin
    Han, Soohee
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (10) : 1 - 15
  • [33] Multitask Augmented Random Search in deep reinforcement learning
    Thanh, Le Tien
    Thang, Ta Bao
    Van Cuong, Le
    Binh, Huynh Thi Thanh
    [J]. APPLIED SOFT COMPUTING, 2024, 160
  • [34] Multitask Neuroevolution for Reinforcement Learning With Long and Short Episodes
    Zhang, Nick
    Gupta, Abhishek
    Chen, Zefeng
    Ong, Yew-Soon
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (03) : 1474 - 1486
  • [35] Adaptive Multifactorial Evolutionary Optimization for Multitask Reinforcement Learning
    Martinez, Aritz D.
    Del Ser, Javier
    Osaba, Eneko
    Herrera, Francisco
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2022, 26 (02) : 233 - 247
  • [36] Reinforcement Learning through Global Stochastic Search in N-MDPs
    Leonetti, Matteo
    Iocchi, Luca
    Ramamoorthy, Subramanian
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2011, 6912 : 326 - 340
  • [37] Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited
    Domingues, Omar Darwiche
    Menard, Pierre
    Kaufmann, Emilie
    Valko, Michal
    [J]. ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
  • [38] Multitask Learning and Reinforcement Learning for Personalized Dialog Generation: An Empirical Study
    Yang, Min
    Huang, Weiyi
    Tu, Wenting
    Qu, Qiang
    Shen, Ying
    Lei, Kai
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (01) : 49 - 62
  • [39] Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
    Dann, Christoph
    Mansour, Yishay
    Mohri, Mehryar
    Sekhari, Ayush
    Sridharan, Karthik
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [40] Model-based reinforcement learning in factored-state MDPs
    Strehl, Alexander L.
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 103 - 110