Multitask reinforcement learning on the distribution of MDPs

被引:0
|
作者
Tanaka, F [1 ]
Yamamura, M [1 ]
机构
[1] Tokyo Inst Technol, Dept Computat Intelligence & Syst Sci, Tokyo 152, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we address a new problem in reinforcement learning. Here we consider an agent that faces multiple learning tasks within its lifetime. The agent's objective is to maximize its total reward in the lifetime as well as a conventional return in each task. To realize this, it has to be endowed an important ability to keep its past learning experiences and utilize them for improving future learning performance. This time we try to phrase this problem formally. The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs. As an approach to exploiting past learning experiences, we focus on statistics (mean and deviation) about the agent's value tables. The mean can be used as initial values of the table when a new task is presented. The deviation can be viewed as measuring reliability of the mean, and we utilize it in calculating priority of simulated backups. We conduct experiments in computer simulation to evaluate the effectiveness.
引用
收藏
页码:1108 / 1113
页数:6
相关论文
共 50 条
  • [1] Reinforcement learning for MDPs with constraints
    Geibel, Peter
    [J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 646 - 653
  • [2] Efficient reinforcement learning in factored MDPs
    Kearns, M
    Koller, D
    [J]. IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, 1999, : 740 - 747
  • [3] Inverse reinforcement learning in contextual MDPs
    Belogolovsky, Stav
    Korsunsky, Philip
    Mannor, Shie
    Tessler, Chen
    Zahavy, Tom
    [J]. MACHINE LEARNING, 2021, 110 (09) : 2295 - 2334
  • [4] Inverse reinforcement learning in contextual MDPs
    Stav Belogolovsky
    Philip Korsunsky
    Shie Mannor
    Chen Tessler
    Tom Zahavy
    [J]. Machine Learning, 2021, 110 : 2295 - 2334
  • [5] Reinforcement learning in finite MDPs: PAC analysis
    Strehl, Alexander L.
    Li, Hong
    Littman, Michael L.
    [J]. Journal of Machine Learning Research, 2009, 10 : 2413 - 2444
  • [6] Knowledge Revision for Reinforcement Learning with Abstract MDPs
    Efthymiadis, Kyriakos
    Devlin, Sam
    Kudenko, Daniel
    [J]. AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1535 - 1536
  • [7] Knowledge Revision for Reinforcement Learning with Abstract MDPs
    Efthymiadis, Kyriakos
    Kudenko, Daniel
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 763 - 770
  • [8] Reinforcement Learning in Parametric MDPs with Exponential Families
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    Maillard, Odalric-Ambrym
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [9] Reinforcement Learning in Finite MDPs: PAC Analysis
    Strehl, Alexander L.
    Li, Lihong
    Littman, Michael L.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 2413 - 2444
  • [10] TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs
    Kozlova, Olga
    Sigaud, Olivier
    Meyer, Christophe
    [J]. FROM ANIMALS TO ANIMATS 11, 2010, 6226 : 489 - +