Sample Efficient Offline-to-Online Reinforcement Learning

被引:3
|
作者
Guo, Siyuan [1 ,2 ]
Zou, Lixin [3 ]
Chen, Hechang [1 ]
Qu, Bohao [1 ]
Chi, Haotian [1 ]
Yu, Philip S. [4 ]
Chang, Yi [1 ,2 ]
机构
[1] Jilin Univ, Engn Res Ctr Knowledge Driven Human Machine Intell, Sch Artificial Intelligence, Changchun 130012, Jilin, Peoples R China
[2] Jilin Univ, Int Ctr Future Sci, Changchun 130012, Jilin, Peoples R China
[3] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan 430072, Hubei, Peoples R China
[4] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
中国国家自然科学基金;
关键词
Behavioral sciences; Perturbation methods; Uncertainty; Metalearning; Adaptation models; Q-learning; Faces; Meta learning; offline-to-online reinforcement learning; optimistic exploration; sample efficiency;
D O I
10.1109/TKDE.2023.3302804
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) makes it possible to train the agents entirely from a previously collected dataset. However, constrained by the quality of the offline dataset, offline RL agents typically have limited performance and cannot be directly deployed. Thus, it is desirable to further finetune the pretrained offline RL agents via online interactions with the environment. Existing offline-to-online RL algorithms suffer from the low sample efficiency issue, due to two inherent challenges, i.e., exploration limitation and distribution shift. To this end, we propose a sample-efficient offline-to-online RL algorithm via Optimistic Exploration and Meta Adaptation (OEMA). Specifically, we first propose an optimistic exploration strategy according to the principle of optimism in the face of uncertainty. This allows agents to sufficiently explore the environment in a stable manner. Moreover, we propose a meta learning based adaptation method, which can reduce the distribution shift and accelerate the offline-to-online adaptation process. We empirically demonstrate that OEMA improves the sample efficiency on D4RL benchmark. Besides, we provide in-depth analyses to verify the effectiveness of both optimistic exploration and meta adaptation.
引用
收藏
页码:1299 / 1310
页数:12
相关论文
共 50 条
  • [21] Efficient Diffusion Policies for Offline Reinforcement Learning
    Kang, Bingyi
    Ma, Xiao
    Du, Chao
    Pang, Tianyu
    Yan, Shuicheng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [22] Online and Offline Reinforcement Learning by Planning with a Learned Model
    Schrittwieser, Julian
    Hubert, Thomas
    Mandhane, Amol
    Barekatain, Mohammadamin
    Antonoglou, Ioannis
    Silver, David
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [23] Influence of incentive frames on offline-to-online interaction of outdoor advertising
    Wei, Zhiyong
    Dou, Wenyu
    Jiang, Qingyun
    Gu, Chenyan
    [J]. JOURNAL OF RETAILING AND CONSUMER SERVICES, 2021, 58
  • [24] A Novel Deep Offline-to-Online Transfer Learning Framework for Pipeline Leakage Detection With Small Samples
    Wang, Chuang
    Wang, Zidong
    Liu, Weibo
    Shen, Yuxuan
    Dong, Hongli
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [25] A Novel Deep Offline-to-Online Transfer Learning Framework for Pipeline Leakage Detection With Small Samples
    Wang, Chuang
    Wang, Zidong
    Liu, Weibo
    Shen, Yuxuan
    Dong, Hongli
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [26] ORLEP: an efficient offline reinforcement learning evaluation platform
    Mao, Keming
    Chen, Chen
    Zhang, Jinkai
    Li, Yiyang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 37073 - 37087
  • [27] Efficient experience replay architecture for offline reinforcement learning
    Zhang, Longfei
    Feng, Yanghe
    Wang, Rongxiao
    Xu, Yue
    Xu, Naifu
    Liu, Zeyi
    Du, Hang
    [J]. ROBOTIC INTELLIGENCE AND AUTOMATION, 2023, 43 (01): : 35 - 43
  • [28] ORLEP: an efficient offline reinforcement learning evaluation platform
    Keming Mao
    Chen Chen
    Jinkai Zhang
    Yiyang Li
    [J]. Multimedia Tools and Applications, 2024, 83 : 37073 - 37087
  • [29] RLSynC: Offline-Online Reinforcement Learning for Synthon Completion
    Baker, Frazier N.
    Chen, Ziqi
    Adu-Ampratwum, Daniel
    Ning, Xia
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (17) : 6723 - 6735
  • [30] Hybrid Online and Offline Reinforcement Learning for Tibetan Jiu Chess
    Li, Xiali
    Lv, Zhengyu
    Wu, Licheng
    Zhao, Yue
    Xu, Xiaona
    [J]. COMPLEXITY, 2020, 2020