Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness

被引:0
|
作者
Wen, Xiaoyu [1 ]
Yu, Xudong [2 ]
Yang, Rui [3 ]
Chen, Haoyuan [1 ]
C., Bai
Z., Wang
机构
[1] Northwestern Polytechnical University, Shaanxi, Xi’an, China
[2] Harbin Institute of Technology, Heilongjiang, Harbin, China
[3] The Hong Kong University of Science and Technology, China
[4] Shanghai Artificial Intelligence Laboratory, Shanghai, China
[5] Shenzhen Research Institute, Northwestern Polytechnical University, GuangDong, Shenzhen, China
关键词
Adversarial machine learning;
D O I
10.1613/jair.1.16457
中图分类号
学科分类号
摘要
To obtain a near-optimal policy with fewer interactions in Reinforcement Learning (RL), a promising approach involves the combination of offline RL, which enhances sample efficiency by leveraging offline datasets, and online RL, which explores informative transitions by interacting with the environment. Offline-to-Online RL provides a paradigm for improving an offline-trained agent within limited online interactions. However, due to the significant distribution shift between online experiences and offline data, most offline RL algorithms suffer from performance drops and fail to achieve stable policy improvement in offline-to-online adaptation. To address this problem, we propose the Robust Offline-to-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness, and to mitigate the performance drop in online adaptation. Specifically, RO2O incorporates Q-ensemble for uncertainty penalty and adversarial samples for policy and value smoothness, which enable RO2O to maintain a consistent learning procedure in online adaptation without requiring special changes to the learning objective. Theoretical analyses in linear MDPs demonstrate that the uncertainty and smoothness lead to tighter optimality bound in offline-to-online against distribution shift. Experimental results illustrate the superiority of RO2O in facilitating stable offline-to-online learning and achieving significant improvement with limited online interactions. ©2024 The Authors.
引用
收藏
页码:481 / 509
相关论文
共 50 条
  • [1] Learning Aerial Docking via Offline-to-Online Reinforcement Learning
    Tao, Yang
    Feng Yuting
    Yu, Yushu
    [J]. 2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 305 - 309
  • [2] Sample Efficient Offline-to-Online Reinforcement Learning
    Guo, Siyuan
    Zou, Lixin
    Chen, Hechang
    Qu, Bohao
    Chi, Haotian
    Yu, Philip S.
    Chang, Yi
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1299 - 1310
  • [3] Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
    Zheng, Han
    Luo, Xufang
    Wei, Pengfei
    Song, Xuan
    Li, Dongsheng
    Jiang, Jing
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11372 - 11380
  • [4] Effective Traffic Signal Control with Offline-to-Online Reinforcement Learning
    Ma, Jinming
    Wu, Feng
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5567 - 5573
  • [5] DCAC: Reducing Unnecessary Conservatism in Offline-to-online Reinforcement Learning
    Chen, Dongxiang
    Wen, Ying
    [J]. 2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023, 2023,
  • [6] Ensemble successor representations for task generalization in offline-to-online reinforcement learning
    Changhong WANG
    Xudong YU
    Chenjia BAI
    Qiaosheng ZHANG
    Zhen WANG
    [J]. Science China(Information Sciences), 2024, (07) - 255
  • [7] Ensemble successor representations for task generalization in offline-to-online reinforcement learning
    Changhong WANG
    Xudong YU
    Chenjia BAI
    Qiaosheng ZHANG
    Zhen WANG
    [J]. Science China(Information Sciences)., 2024, 67 (07) - 255
  • [8] Ensemble successor representations for task generalization in offline-to-online reinforcement learning
    Wang, Changhong
    Yu, Xudong
    Bai, Chenjia
    Zhang, Qiaosheng
    Wang, Zhen
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (07)
  • [9] A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
    Zhang, Yinmin
    Liu, Jie
    Li, Chuming
    Niu, Yazhe
    Yang, Yaodong
    Liu, Yu
    Ouyang, Wanli
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16908 - 16916
  • [10] SUF: Stabilized Unconstrained Fine-Tuning for Offline-to-Online Reinforcement Learning
    Feng, Jiaheng
    Feng, Mingxiao
    Song, Haolin
    Zhou, Wengang
    Li, Houqiang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11, 2024, : 11961 - 11969