Multitask reinforcement learning on the distribution of MDPs

被引:0
|
作者
Tanaka, F [1 ]
Yamamura, M [1 ]
机构
[1] Tokyo Inst Technol, Dept Computat Intelligence & Syst Sci, Tokyo 152, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we address a new problem in reinforcement learning. Here we consider an agent that faces multiple learning tasks within its lifetime. The agent's objective is to maximize its total reward in the lifetime as well as a conventional return in each task. To realize this, it has to be endowed an important ability to keep its past learning experiences and utilize them for improving future learning performance. This time we try to phrase this problem formally. The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs. As an approach to exploiting past learning experiences, we focus on statistics (mean and deviation) about the agent's value tables. The mean can be used as initial values of the table when a new task is presented. The deviation can be viewed as measuring reliability of the mean, and we utilize it in calculating priority of simulated backups. We conduct experiments in computer simulation to evaluate the effectiveness.
引用
收藏
页码:1108 / 1113
页数:6
相关论文
共 50 条
  • [21] Provable Benefit of Multitask Representation Learning in Reinforcement Learning
    Cheng, Yuan
    Feng, Songtao
    Yang, Jing
    Zhang, Hong
    Liang, Yingbin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [22] Multitask Learning for Object Localization With Deep Reinforcement Learning
    Wang, Yan
    Zhang, Lei
    Wang, Lituan
    Wang, Zizhou
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2019, 11 (04) : 573 - 580
  • [23] Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs
    HasanzadeZonuzy, Aria
    Bura, Archana
    Kalathil, Dileep
    Shakkottai, Srinivas
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7667 - 7674
  • [24] Scalable Multitask Policy Gradient Reinforcement Learning
    El Bsat, Salam
    Ammar, Haitham Bou
    Taylor, Matthew E.
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1847 - 1853
  • [25] Evolutionary computation on multitask reinforcement learning problems
    Handa, Hisashi
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING, AND CONTROL, VOLS 1 AND 2, 2007, : 685 - 688
  • [26] Distributed Multitask Reinforcement Learning with Quadratic Convergence
    Tutunov, Rasul
    Kim, Dongho
    Bou-Ammar, Haitham
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [27] Modular Multitask Reinforcement Learning with Policy Sketches
    Andreas, Jacob
    Klein, Dan
    Levine, Sergey
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [28] RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning
    Petrik, Marek
    Subramanian, Dharmashankar
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [29] Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping
    Zhou, Dongruo
    He, Jiafan
    Gu, Quanquan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [30] Reinforcement learning methods to handle actions with differing costs in MDPs
    Ishiguro, T
    Matsui, T
    Inuzuka, N
    Wada, K
    [J]. KNOWLEDGE-BASED INTELLIGNET INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2003, 2774 : 553 - 560