Multi-task Reinforcement Learning in Partially Observable Stochastic Environments

被引:0
|
作者
Li, Hui [1 ]
Liao, Xuejun [1 ]
Carin, Lawrence [1 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
关键词
reinforcement learning; partially observable Markov decision processes; multi-task learning; Dirichlet processes; regionalized policy representation; HIDDEN MARKOV-MODELS; INFINITE-HORIZON; DISTRIBUTIONS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of multi-task reinforcement learning (MTRL) in multiple partially observable stochastic environments. We introduce the regionalized policy representation (RPR) to characterize the agent's behavior in each environment. The RPR is a parametric model of the conditional distribution over current actions given the history of past actions and observations; the agent's choice of actions is directly based on this conditional distribution, without an intervening model to characterize the environment itself. We propose off-policy batch algorithms to learn the parameters of the RPRs, using episodic data collected when following a behavior policy, and show their linkage to policy iteration. We employ the Dirichlet process as a nonparametric prior over the RPRs across multiple environments. The intrinsic clustering property of the Dirichlet process imposes sharing of episodes among similar environments, which effectively reduces the number of episodes required for learning a good policy in each environment, when data sharing is appropriate. The number of distinct RPRs and the associated clusters (the sharing patterns) are automatically discovered by exploiting the episodic data as well as the nonparametric nature of the Dirichlet process. We demonstrate the effectiveness of the proposed RPR as well as the RPR-based MTRL framework on various problems, including grid-world navigation and multi-aspect target classification. The experimental results show that the RPR is a competitive reinforcement learning algorithm in partially observable domains, and the MTRL consistently achieves better performance than single task reinforcement learning.
引用
收藏
页码:1131 / 1186
页数:56
相关论文
共 50 条
  • [41] Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning
    Lan, Siming
    Zhang, Rui
    Yi, Qi
    Guo, Jiaming
    Peng, Shaohui
    Gao, Yunkai
    Wu, Fan
    Chen, Ruizhi
    Du, Zidong
    Hu, Xing
    Zhang, Xishan
    Li, Ling
    Chen, Yunji
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
    Yu, Tianhe
    Kumar, Aviral
    Chebotar, Yevgen
    Hausman, Karol
    Levine, Sergey
    Finn, Chelsea
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [43] Discovering Synergies for Robot Manipulation with Multi-Task Reinforcement Learning
    He, Zhanpeng
    Ciocarlie, Matei
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 2714 - 2721
  • [44] PiCor: Multi-Task Deep Reinforcement Learning with Policy Correction
    Bai, Fengshuo
    Zhang, Hongming
    Tao, Tianyang
    Wu, Zhiheng
    Wang, Yanna
    Xu, Bo
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6728 - 6736
  • [45] A Multi-Task Reinforcement Learning Approach for Navigating Unsignalized Intersections
    Kai, Shixiong
    Wang, Bin
    Chen, Dong
    Hao, Jianye
    Zhang, Hongbo
    Liu, Wulong
    [J]. 2020 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2020, : 1682 - 1687
  • [46] Multi-task Deep Reinforcement Learning: a Combination of Rainbow and DisTraL
    Andalibi, Milad
    Setoodeh, Peyman
    Mansourieh, Ali
    Asemani, Mohammad Hassan
    [J]. 2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [47] Prioritized Sampling with Intrinsic Motivation in Multi-Task Reinforcement Learning
    D'Eramo, Carlo
    Chalvatzaki, Georgia
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [48] A reinforcement learning algorithm in partially observable environments using short-term memory
    Suematsu, N
    Hayashi, A
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 1059 - 1065
  • [49] Multi-task Self-Supervised Adaptation for Reinforcement Learning
    Wu, Keyu
    Chen, Zhenghua
    Wu, Min
    Xiang, Shili
    Jin, Ruibing
    Zhang, Le
    Li, Xiaoli
    [J]. 2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 15 - 20
  • [50] Multi-Task Reinforcement Meta-Learning in Neural Networks
    Shakah, Ghazi
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 263 - 269