Multitask reinforcement learning on the distribution of MDPs

被引：0

作者：

Tanaka, F ^{[1
]}

Yamamura, M ^{[1
]}

机构：

[1] Tokyo Inst Technol, Dept Computat Intelligence & Syst Sci, Tokyo 152, Japan

来源：

2003 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION, VOLS I-III, PROCEEDINGS | 2003年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we address a new problem in reinforcement learning. Here we consider an agent that faces multiple learning tasks within its lifetime. The agent's objective is to maximize its total reward in the lifetime as well as a conventional return in each task. To realize this, it has to be endowed an important ability to keep its past learning experiences and utilize them for improving future learning performance. This time we try to phrase this problem formally. The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs. As an approach to exploiting past learning experiences, we focus on statistics (mean and deviation) about the agent's value tables. The mean can be used as initial values of the table when a new task is presented. The deviation can be viewed as measuring reliability of the mean, and we utilize it in calculating priority of simulated backups. We conduct experiments in computer simulation to evaluate the effectiveness.

引用

页码：1108 / 1113

页数：6

共 50 条

[31] Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
Papini, Matteo
Tirinzoni, Andrea
Pacchiano, Aldo
Restilli, Marcello
Lazaric, Alessandro
Pirotta, Matteo
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[32] Efficient Multitask Reinforcement Learning Without Performance Loss
Baek, Jongchan
Baek, Seungmin
Han, Soohee
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (10) : 1 - 15
[33] Multitask Augmented Random Search in deep reinforcement learning
Thanh, Le Tien
Thang, Ta Bao
Van Cuong, Le
Binh, Huynh Thi Thanh
[J]. APPLIED SOFT COMPUTING, 2024, 160
[34] Multitask Neuroevolution for Reinforcement Learning With Long and Short Episodes
Zhang, Nick
Gupta, Abhishek
Chen, Zefeng
Ong, Yew-Soon
[J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (03) : 1474 - 1486
[35] Adaptive Multifactorial Evolutionary Optimization for Multitask Reinforcement Learning
Martinez, Aritz D.
Del Ser, Javier
Osaba, Eneko
Herrera, Francisco
[J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2022, 26 (02) : 233 - 247
[36] Reinforcement Learning through Global Stochastic Search in N-MDPs
Leonetti, Matteo
Iocchi, Luca
Ramamoorthy, Subramanian
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2011, 6912 : 326 - 340
[37] Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited
Domingues, Omar Darwiche
Menard, Pierre
Kaufmann, Emilie
Valko, Michal
[J]. ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
[38] Multitask Learning and Reinforcement Learning for Personalized Dialog Generation: An Empirical Study
Yang, Min
Huang, Weiyi
Tu, Wenting
Qu, Qiang
Shen, Ying
Lei, Kai
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (01) : 49 - 62
[39] Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
Dann, Christoph
Mansour, Yishay
Mohri, Mehryar
Sekhari, Ayush
Sridharan, Karthik
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[40] Model-based reinforcement learning in factored-state MDPs
Strehl, Alexander L.
[J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 103 - 110

← 1 2 3 4 5 →