Pessimistic value iteration for multi-task data sharing in Offline Reinforcement Learning

被引:1
|
作者
Bai, Chenjia [1 ,6 ]
Wang, Lingxiao [2 ]
Hao, Jianye [3 ]
Yang, Zhuoran [4 ]
Zhao, Bin [1 ,5 ]
Wang, Zhen [5 ]
Li, Xuelong [1 ,5 ]
机构
[1] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
[2] Northwestern Univ, Dept Ind Engn & Management Sci, Evanston, IL USA
[3] Tianjin Univ, Tianjin, Peoples R China
[4] Yale Univ, Dept Stat & Data Sci, New Haven, CT USA
[5] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian, Peoples R China
[6] Northwestern Polytech Univ, Shenzhen Res Inst, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Uncertainty quantification; Data sharing; Pessimistic value iteration; Offline Reinforcement Learning;
D O I
10.1016/j.artint.2023.104048
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi -Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from other tasks exacerbates the distribution shift in offline RL. In this paper, we propose an uncertainty-based MTDS approach that shares the entire dataset without data selection. Given ensemble-based uncertainty quantification, we perform pessimistic value iteration on the shared offline dataset, which provides a unified framework for single-and multi-task offline RL. We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing. Empirically, we release an MTDS benchmark and collect datasets from three challenging domains. The experimental results show our algorithm outperforms the previous state-of-the-art methods in challenging MTDS problems.
引用
下载
收藏
页数:30
相关论文
共 50 条
  • [1] Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
    Yu, Tianhe
    Kumar, Aviral
    Chebotar, Yevgen
    Hausman, Karol
    Levine, Sergey
    Finn, Chelsea
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration
    Wu, Runzhe
    Zhang, Yufeng
    Yang, Zhuoran
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning
    Yoo, Minjong
    Cho, Sangwoo
    Woo, Honguk
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] Towards Offline Reinforcement Learning with Pessimistic Value Priors
    Valdettaro, Filippo
    Faisal, A. Aldo
    EPISTEMIC UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, EPI UAI 2023, 2024, 14523 : 89 - 100
  • [5] Fitting and sharing multi-task learning
    Piao, Chengkai
    Wei, Jinmao
    APPLIED INTELLIGENCE, 2024, 54 (9-10) : 6918 - 6929
  • [6] Multi-task reinforcement learning in humans
    Momchil S. Tomov
    Eric Schulz
    Samuel J. Gershman
    Nature Human Behaviour, 2021, 5 : 764 - 773
  • [7] Multi-task reinforcement learning in humans
    Tomov, Momchil S.
    Schulz, Eric
    Gershman, Samuel J.
    NATURE HUMAN BEHAVIOUR, 2021, 5 (06) : 764 - +
  • [8] Sparse Multi-Task Reinforcement Learning
    Calandriello, Daniele
    Lazaric, Alessandro
    Restelli, Marcello
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [9] Multi-task Learning with Modular Reinforcement Learning
    Xue, Jianyong
    Alexandre, Frederic
    FROM ANIMALS TO ANIMATS 16, 2022, 13499 : 127 - 138
  • [10] Sparse multi-task reinforcement learning
    Calandriello, Daniele
    Lazaric, Alessandro
    Restelli, Marcello
    INTELLIGENZA ARTIFICIALE, 2015, 9 (01) : 5 - 20