Multitask reinforcement learning on the distribution of MDPs

被引：0

作者：

Tanaka, F ^{[1
]}

Yamamura, M ^{[1
]}

机构：

[1] Tokyo Inst Technol, Dept Computat Intelligence & Syst Sci, Tokyo 152, Japan

来源：

2003 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION, VOLS I-III, PROCEEDINGS | 2003年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we address a new problem in reinforcement learning. Here we consider an agent that faces multiple learning tasks within its lifetime. The agent's objective is to maximize its total reward in the lifetime as well as a conventional return in each task. To realize this, it has to be endowed an important ability to keep its past learning experiences and utilize them for improving future learning performance. This time we try to phrase this problem formally. The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs. As an approach to exploiting past learning experiences, we focus on statistics (mean and deviation) about the agent's value tables. The mean can be used as initial values of the table when a new task is presented. The deviation can be viewed as measuring reliability of the mean, and we utilize it in calculating priority of simulated backups. We conduct experiments in computer simulation to evaluate the effectiveness.

引用

页码：1108 / 1113

页数：6

共 50 条

[21] Provable Benefit of Multitask Representation Learning in Reinforcement Learning
Cheng, Yuan
Feng, Songtao
Yang, Jing
Zhang, Hong
Liang, Yingbin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[22] Multitask Learning for Object Localization With Deep Reinforcement Learning
Wang, Yan
Zhang, Lei
Wang, Lituan
Wang, Zizhou
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2019, 11 (04) : 573 - 580
[23] Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs
HasanzadeZonuzy, Aria
Bura, Archana
Kalathil, Dileep
Shakkottai, Srinivas
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7667 - 7674
[24] Scalable Multitask Policy Gradient Reinforcement Learning
El Bsat, Salam
Ammar, Haitham Bou
Taylor, Matthew E.
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1847 - 1853
[25] Evolutionary computation on multitask reinforcement learning problems
Handa, Hisashi
2007 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING, AND CONTROL, VOLS 1 AND 2, 2007, : 685 - 688
[26] Modular Multitask Reinforcement Learning with Policy Sketches
Andreas, Jacob
Klein, Dan
Levine, Sergey
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[27] Distributed Multitask Reinforcement Learning with Quadratic Convergence
Tutunov, Rasul
Kim, Dongho
Bou-Ammar, Haitham
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[28] RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning
Petrik, Marek
Subramanian, Dharmashankar
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[29] Offline Primal-Dual Reinforcement Learning for Linear MDPs
Gabbianelli, Germano
Neu, Gergely
Okolo, Nneka
Papini, Matteo
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[30] Reinforcement learning methods to handle actions with differing costs in MDPs
Ishiguro, T
Matsui, T
Inuzuka, N
Wada, K
KNOWLEDGE-BASED INTELLIGNET INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2003, 2774 : 553 - 560

← 1 2 3 4 5 →