Sample Efficient Offline-to-Online Reinforcement Learning

被引:3
|
作者
Guo, Siyuan [1 ,2 ]
Zou, Lixin [3 ]
Chen, Hechang [1 ]
Qu, Bohao [1 ]
Chi, Haotian [1 ]
Yu, Philip S. [4 ]
Chang, Yi [1 ,2 ]
机构
[1] Jilin Univ, Engn Res Ctr Knowledge Driven Human Machine Intell, Sch Artificial Intelligence, Changchun 130012, Jilin, Peoples R China
[2] Jilin Univ, Int Ctr Future Sci, Changchun 130012, Jilin, Peoples R China
[3] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan 430072, Hubei, Peoples R China
[4] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
中国国家自然科学基金;
关键词
Behavioral sciences; Perturbation methods; Uncertainty; Metalearning; Adaptation models; Q-learning; Faces; Meta learning; offline-to-online reinforcement learning; optimistic exploration; sample efficiency;
D O I
10.1109/TKDE.2023.3302804
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) makes it possible to train the agents entirely from a previously collected dataset. However, constrained by the quality of the offline dataset, offline RL agents typically have limited performance and cannot be directly deployed. Thus, it is desirable to further finetune the pretrained offline RL agents via online interactions with the environment. Existing offline-to-online RL algorithms suffer from the low sample efficiency issue, due to two inherent challenges, i.e., exploration limitation and distribution shift. To this end, we propose a sample-efficient offline-to-online RL algorithm via Optimistic Exploration and Meta Adaptation (OEMA). Specifically, we first propose an optimistic exploration strategy according to the principle of optimism in the face of uncertainty. This allows agents to sufficiently explore the environment in a stable manner. Moreover, we propose a meta learning based adaptation method, which can reduce the distribution shift and accelerate the offline-to-online adaptation process. We empirically demonstrate that OEMA improves the sample efficiency on D4RL benchmark. Besides, we provide in-depth analyses to verify the effectiveness of both optimistic exploration and meta adaptation.
引用
收藏
页码:1299 / 1310
页数:12
相关论文
共 50 条
  • [1] Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
    Zheng, Han
    Luo, Xufang
    Wei, Pengfei
    Song, Xuan
    Li, Dongsheng
    Jiang, Jing
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11372 - 11380
  • [2] Learning Aerial Docking via Offline-to-Online Reinforcement Learning
    Tao, Yang
    Feng Yuting
    Yu, Yushu
    [J]. 2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 305 - 309
  • [3] Effective Traffic Signal Control with Offline-to-Online Reinforcement Learning
    Ma, Jinming
    Wu, Feng
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5567 - 5573
  • [4] DCAC: Reducing Unnecessary Conservatism in Offline-to-online Reinforcement Learning
    Chen, Dongxiang
    Wen, Ying
    [J]. 2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023, 2023,
  • [5] Ensemble successor representations for task generalization in offline-to-online reinforcement learning
    Changhong WANG
    Xudong YU
    Chenjia BAI
    Qiaosheng ZHANG
    Zhen WANG
    [J]. Science China(Information Sciences), 2024, (07) - 255
  • [6] Ensemble successor representations for task generalization in offline-to-online reinforcement learning
    Changhong WANG
    Xudong YU
    Chenjia BAI
    Qiaosheng ZHANG
    Zhen WANG
    [J]. Science China(Information Sciences)., 2024, 67 (07) - 255
  • [7] Ensemble successor representations for task generalization in offline-to-online reinforcement learning
    Wang, Changhong
    Yu, Xudong
    Bai, Chenjia
    Zhang, Qiaosheng
    Wang, Zhen
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (07)
  • [8] Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness
    Wen, Xiaoyu
    Yu, Xudong
    Yang, Rui
    Chen, Haoyuan
    C., Bai
    Z., Wang
    [J]. Journal of Artificial Intelligence Research, 2024, 81 : 481 - 509
  • [9] A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
    Zhang, Yinmin
    Liu, Jie
    Li, Chuming
    Niu, Yazhe
    Yang, Yaodong
    Liu, Yu
    Ouyang, Wanli
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16908 - 16916
  • [10] SUF: Stabilized Unconstrained Fine-Tuning for Offline-to-Online Reinforcement Learning
    Feng, Jiaheng
    Feng, Mingxiao
    Song, Haolin
    Zhou, Wengang
    Li, Houqiang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11, 2024, : 11961 - 11969