Multitask reinforcement learning on the distribution of MDPs

被引：0

作者：

Tanaka, F ^{[1
]}

Yamamura, M ^{[1
]}

机构：

[1] Tokyo Inst Technol, Dept Computat Intelligence & Syst Sci, Tokyo 152, Japan

来源：

2003 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION, VOLS I-III, PROCEEDINGS | 2003年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we address a new problem in reinforcement learning. Here we consider an agent that faces multiple learning tasks within its lifetime. The agent's objective is to maximize its total reward in the lifetime as well as a conventional return in each task. To realize this, it has to be endowed an important ability to keep its past learning experiences and utilize them for improving future learning performance. This time we try to phrase this problem formally. The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs. As an approach to exploiting past learning experiences, we focus on statistics (mean and deviation) about the agent's value tables. The mean can be used as initial values of the table when a new task is presented. The deviation can be viewed as measuring reliability of the mean, and we utilize it in calculating priority of simulated backups. We conduct experiments in computer simulation to evaluate the effectiveness.

引用

页码：1108 / 1113

页数：6

共 50 条

[1] Reinforcement learning for MDPs with constraints
Geibel, Peter
MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 646 - 653
[2] Efficient reinforcement learning in factored MDPs
Kearns, M
Koller, D
IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, 1999, : 740 - 747
[3] Inverse reinforcement learning in contextual MDPs
Belogolovsky, Stav
Korsunsky, Philip
Mannor, Shie
Tessler, Chen
Zahavy, Tom
MACHINE LEARNING, 2021, 110 (09) : 2295 - 2334
[4] Inverse reinforcement learning in contextual MDPs
Stav Belogolovsky
Philip Korsunsky
Shie Mannor
Chen Tessler
Tom Zahavy
Machine Learning, 2021, 110 : 2295 - 2334
[5] Reinforcement learning in finite MDPs: PAC analysis
Strehl, Alexander L.
Li, Hong
Littman, Michael L.
Journal of Machine Learning Research, 2009, 10 : 2413 - 2444
[6] Knowledge Revision for Reinforcement Learning with Abstract MDPs
Efthymiadis, Kyriakos
Kudenko, Daniel
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 763 - 770
[7] Reinforcement Learning in Parametric MDPs with Exponential Families
Chowdhury, Sayak Ray
Gopalan, Aditya
Maillard, Odalric-Ambrym
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[8] Knowledge Revision for Reinforcement Learning with Abstract MDPs
Efthymiadis, Kyriakos
Devlin, Sam
Kudenko, Daniel
AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1535 - 1536
[9] Reinforcement Learning in Finite MDPs: PAC Analysis
Strehl, Alexander L.
Li, Lihong
Littman, Michael L.
JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 2413 - 2444
[10] Reinforcement Learning in Reward-Mixing MDPs
Kwon, Jeongyeol
Efroni, Yonathan
Caramanis, Constantine
Mannor, Shie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →