Robust Reinforcement Learning via Progressive Task Sequence

被引:0
|
作者
Li, Yike [1 ]
Tian, Yunzhe [1 ]
Tong, Endong [1 ]
Niu, Wenjia [1 ]
Liu, Jiqiang [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Secur & Privacy Intelligent Trans, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robust reinforcement learning (RL) has been a challenging problem due to the gap between simulation and the real world. Existing efforts typically address the robust RL problem by solving a maxmin problem. The main idea is to maximize the cumulative reward under the worst-possible perturbations. However, the worst-case optimization either leads to overly conservative solutions or unstable training process, which further affects the policy robustness and generalization performance. In this paper, we tackle this problem from both formulation definition and algorithm design. First, we formulate the robust RL as a max-expectation optimization problem, where the goal is to find an optimal policy under both the worst cases and the non-worst cases. Then, we propose a novel framework DRRL to solve the max-expectation optimization. Given our definition of the feasible tasks, a task generation and sequencing mechanism is introduced to dynamically output tasks at appropriate difficulty level for the current policy. With these progressive tasks, DRRL realizes dynamic multi-task learning to improve the policy robustness and the training stability. Finally, extensive experiments demonstrate that the proposed method exhibits significant performance on the unmanned CarRacing game and multiple high-dimensional MuJoCo environments.
引用
收藏
页码:455 / 463
页数:9
相关论文
共 50 条
  • [21] Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
    Kamalaruban, Parameswaran
    Huang, Yu-Ting
    Hsieh, Ya-Ping
    Rolland, Paul
    Shi, Cheng
    Cevher, Volkan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [22] RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
    Yang, Rui
    Bai, Chenjia
    Ma, Xiaoteng
    Wang, Zhaoran
    Zhang, Chongjie
    Han, Lei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [23] Interpretable, Verifiable, and Robust Reinforcement Learning via Program Synthesis
    Bastani, Osbert
    Inala, Jeevana Priya
    Solar-Lezama, Armando
    XXAI - BEYOND EXPLAINABLE AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers, 2022, 13200 : 207 - 228
  • [24] Robust Imitation via Mirror Descent Inverse Reinforcement Learning
    Han, Dong-Sig
    Kim, Hyunseo
    Lee, Hyundo
    Ryu, Je-Hwan
    Zhang, Byoung-Tak
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [25] Robust reinforcement learning
    Morimoto, J
    Doya, K
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1061 - 1067
  • [26] Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization
    Zhang, Tiantian
    Lin, Zichuan
    Wang, Yuxing
    Ye, Deheng
    Fu, Qiang
    Yang, Wei
    Wang, Xueqian
    Liang, Bin
    Yuan, Bo
    Li, Xiu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14588 - 14602
  • [27] Robust reinforcement learning
    Morimoto, J
    Doya, K
    NEURAL COMPUTATION, 2005, 17 (02) : 335 - 359
  • [28] A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning
    Wang, Zhi
    Chen, Chunlin
    Dong, Daoyi
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (12) : 7509 - 7520
  • [29] Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning
    Bui, Hien
    Posa, Michael
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 9212 - 9219
  • [30] Options in Multi-task Reinforcement Learning - Transfer via Reflection
    Denis, Nicholas
    Fraser, Maia
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11489 : 225 - 237