Multi-task Reinforcement Learning in Partially Observable Stochastic Environments

被引:0
|
作者
Li, Hui [1 ]
Liao, Xuejun [1 ]
Carin, Lawrence [1 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
关键词
reinforcement learning; partially observable Markov decision processes; multi-task learning; Dirichlet processes; regionalized policy representation; HIDDEN MARKOV-MODELS; INFINITE-HORIZON; DISTRIBUTIONS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of multi-task reinforcement learning (MTRL) in multiple partially observable stochastic environments. We introduce the regionalized policy representation (RPR) to characterize the agent's behavior in each environment. The RPR is a parametric model of the conditional distribution over current actions given the history of past actions and observations; the agent's choice of actions is directly based on this conditional distribution, without an intervening model to characterize the environment itself. We propose off-policy batch algorithms to learn the parameters of the RPRs, using episodic data collected when following a behavior policy, and show their linkage to policy iteration. We employ the Dirichlet process as a nonparametric prior over the RPRs across multiple environments. The intrinsic clustering property of the Dirichlet process imposes sharing of episodes among similar environments, which effectively reduces the number of episodes required for learning a good policy in each environment, when data sharing is appropriate. The number of distinct RPRs and the associated clusters (the sharing patterns) are automatically discovered by exploiting the episodic data as well as the nonparametric nature of the Dirichlet process. We demonstrate the effectiveness of the proposed RPR as well as the RPR-based MTRL framework on various problems, including grid-world navigation and multi-aspect target classification. The experimental results show that the RPR is a competitive reinforcement learning algorithm in partially observable domains, and the MTRL consistently achieves better performance than single task reinforcement learning.
引用
收藏
页码:1131 / 1186
页数:56
相关论文
共 50 条
  • [1] Multi-task reinforcement learning in partially observable stochastic environments
    Li, Hui
    Liao, Xuejun
    Carin, Lawrence
    [J]. Journal of Machine Learning Research, 2009, 10 : 1131 - 1186
  • [2] Learning a navigation task in changing environments by multi-task reinforcement learning
    Grossmann, A
    Poli, R
    [J]. ADVANCES IN ROBOT LEARNING, PROCEEDINGS, 2000, 1812 : 23 - 43
  • [3] Inverse reinforcement learning in partially observable environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. Journal of Machine Learning Research, 2011, 12 : 691 - 730
  • [4] Inverse Reinforcement Learning in Partially Observable Environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 691 - 730
  • [5] Inverse Reinforcement Learning in Partially Observable Environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1028 - 1033
  • [6] Multi-task reinforcement learning in humans
    Momchil S. Tomov
    Eric Schulz
    Samuel J. Gershman
    [J]. Nature Human Behaviour, 2021, 5 : 764 - 773
  • [7] Multi-task reinforcement learning in humans
    Tomov, Momchil S.
    Schulz, Eric
    Gershman, Samuel J.
    [J]. NATURE HUMAN BEHAVIOUR, 2021, 5 (06) : 764 - +
  • [8] Sparse Multi-Task Reinforcement Learning
    Calandriello, Daniele
    Lazaric, Alessandro
    Restelli, Marcello
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [9] Multi-task Learning with Modular Reinforcement Learning
    Xue, Jianyong
    Alexandre, Frederic
    [J]. FROM ANIMALS TO ANIMATS 16, 2022, 13499 : 127 - 138
  • [10] Sparse multi-task reinforcement learning
    Calandriello, Daniele
    Lazaric, Alessandro
    Restelli, Marcello
    [J]. INTELLIGENZA ARTIFICIALE, 2015, 9 (01) : 5 - 20