Sample Efficient Offline-to-Online Reinforcement Learning

被引:3
|
作者
Guo, Siyuan [1 ,2 ]
Zou, Lixin [3 ]
Chen, Hechang [1 ]
Qu, Bohao [1 ]
Chi, Haotian [1 ]
Yu, Philip S. [4 ]
Chang, Yi [1 ,2 ]
机构
[1] Jilin Univ, Engn Res Ctr Knowledge Driven Human Machine Intell, Sch Artificial Intelligence, Changchun 130012, Jilin, Peoples R China
[2] Jilin Univ, Int Ctr Future Sci, Changchun 130012, Jilin, Peoples R China
[3] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan 430072, Hubei, Peoples R China
[4] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
中国国家自然科学基金;
关键词
Behavioral sciences; Perturbation methods; Uncertainty; Metalearning; Adaptation models; Q-learning; Faces; Meta learning; offline-to-online reinforcement learning; optimistic exploration; sample efficiency;
D O I
10.1109/TKDE.2023.3302804
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) makes it possible to train the agents entirely from a previously collected dataset. However, constrained by the quality of the offline dataset, offline RL agents typically have limited performance and cannot be directly deployed. Thus, it is desirable to further finetune the pretrained offline RL agents via online interactions with the environment. Existing offline-to-online RL algorithms suffer from the low sample efficiency issue, due to two inherent challenges, i.e., exploration limitation and distribution shift. To this end, we propose a sample-efficient offline-to-online RL algorithm via Optimistic Exploration and Meta Adaptation (OEMA). Specifically, we first propose an optimistic exploration strategy according to the principle of optimism in the face of uncertainty. This allows agents to sufficiently explore the environment in a stable manner. Moreover, we propose a meta learning based adaptation method, which can reduce the distribution shift and accelerate the offline-to-online adaptation process. We empirically demonstrate that OEMA improves the sample efficiency on D4RL benchmark. Besides, we provide in-depth analyses to verify the effectiveness of both optimistic exploration and meta adaptation.
引用
收藏
页码:1299 / 1310
页数:12
相关论文
共 50 条
  • [41] Offline Meta-Reinforcement Learning with Online Self-Supervision
    Pong, Vitchyr H.
    Nair, Ashvin
    Smith, Laura
    Huang, Catherine
    Levine, Sergey
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [42] A maintenance planning framework using online and offline deep reinforcement learning
    Bukhsh, Zaharah A.
    Molegraaf, Hajo
    Jansen, Nils
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023,
  • [43] Hybrid Offline/Online Optimization for Energy Management via Reinforcement Learning
    Silvestri, Mattia
    De Filippo, Allegra
    Ruggeri, Federico
    Lombardi, Michele
    [J]. INTEGRATION OF CONSTRAINT PROGRAMMING, ARTIFICIAL INTELLIGENCE, AND OPERATIONS RESEARCH, CPAIOR 2022, 2022, 13292 : 358 - 373
  • [44] Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
    Shi, Laixi
    Li, Gen
    Wei, Yuting
    Chen, Yuxin
    Chi, Yuejie
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [45] Sample Efficient Reinforcement Learning with Gaussian Processes
    Grande, Robert C.
    Walsh, Thomas J.
    How, Jonathan P.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1332 - 1340
  • [46] Offline-to-online and online-to-offline (a reciprocal O2O model): re-patronage in an omni-channel
    Liao, Shu-Hsien
    Hu, Da-Chian
    Liu, Hui-Ling
    [J]. INTERNATIONAL JOURNAL OF RETAIL & DISTRIBUTION MANAGEMENT, 2024, 52 (03) : 341 - 354
  • [47] Sample Efficient Deep Reinforcement Learning With Online State Abstraction and Causal Transformer Model Prediction
    Lan, Yixing
    Xu, Xin
    Fang, Qiang
    Hao, Jianye
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (11) : 1 - 15
  • [48] Offline Learning of Prototypical Negatives for Efficient Online Exemplar SVM
    Takami, Masato
    Bell, Peter
    Ommer, Bjoern
    [J]. 2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 377 - 384
  • [49] Offline Reinforcement Learning with Pseudometric Learning
    Dadashi, Robert
    Rezaeifar, Shideh
    Vieillard, Nino
    Hussenot, Leonard
    Pietquin, Olivier
    Geist, Matthieu
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [50] Improving Offline Reinforcement Learning With In-Sample Advantage Regularization for Robot Manipulation
    Ma, Chengzhong
    Yang, Deyu
    Wu, Tianyu
    Liu, Zeyang
    Yang, Houxue
    Chen, Xingyu
    Lan, Xuguang
    Zheng, Nanning
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,