Sample Efficient Offline-to-Online Reinforcement Learning

被引：3

作者：

Guo, Siyuan ^{[1
,2
]}

Zou, Lixin ^{[3
]}

Chen, Hechang ^{[1
]}

Qu, Bohao ^{[1
]}

Chi, Haotian ^{[1
]}

Yu, Philip S. ^{[4
]}

Chang, Yi ^{[1
,2
]}

机构：

[1] Jilin Univ, Engn Res Ctr Knowledge Driven Human Machine Intell, Sch Artificial Intelligence, Changchun 130012, Jilin, Peoples R China

[2] Jilin Univ, Int Ctr Future Sci, Changchun 130012, Jilin, Peoples R China

[3] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan 430072, Hubei, Peoples R China

[4] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2024年 / 36卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Behavioral sciences; Perturbation methods; Uncertainty; Metalearning; Adaptation models; Q-learning; Faces; Meta learning; offline-to-online reinforcement learning; optimistic exploration; sample efficiency;

D O I：

10.1109/TKDE.2023.3302804

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning (RL) makes it possible to train the agents entirely from a previously collected dataset. However, constrained by the quality of the offline dataset, offline RL agents typically have limited performance and cannot be directly deployed. Thus, it is desirable to further finetune the pretrained offline RL agents via online interactions with the environment. Existing offline-to-online RL algorithms suffer from the low sample efficiency issue, due to two inherent challenges, i.e., exploration limitation and distribution shift. To this end, we propose a sample-efficient offline-to-online RL algorithm via Optimistic Exploration and Meta Adaptation (OEMA). Specifically, we first propose an optimistic exploration strategy according to the principle of optimism in the face of uncertainty. This allows agents to sufficiently explore the environment in a stable manner. Moreover, we propose a meta learning based adaptation method, which can reduce the distribution shift and accelerate the offline-to-online adaptation process. We empirically demonstrate that OEMA improves the sample efficiency on D4RL benchmark. Besides, we provide in-depth analyses to verify the effectiveness of both optimistic exploration and meta adaptation.

引用

页码：1299 / 1310

页数：12

共 50 条

[41] Offline Meta-Reinforcement Learning with Online Self-Supervision
Pong, Vitchyr H.
Nair, Ashvin
Smith, Laura
Huang, Catherine
Levine, Sergey
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[42] A maintenance planning framework using online and offline deep reinforcement learning
Bukhsh, Zaharah A.
Molegraaf, Hajo
Jansen, Nils
[J]. NEURAL COMPUTING & APPLICATIONS, 2023,
[43] Hybrid Offline/Online Optimization for Energy Management via Reinforcement Learning
Silvestri, Mattia
De Filippo, Allegra
Ruggeri, Federico
Lombardi, Michele
[J]. INTEGRATION OF CONSTRAINT PROGRAMMING, ARTIFICIAL INTELLIGENCE, AND OPERATIONS RESEARCH, CPAIOR 2022, 2022, 13292 : 358 - 373
[44] Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
Shi, Laixi
Li, Gen
Wei, Yuting
Chen, Yuxin
Chi, Yuejie
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[45] Sample Efficient Reinforcement Learning with Gaussian Processes
Grande, Robert C.
Walsh, Thomas J.
How, Jonathan P.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1332 - 1340
[46] Offline-to-online and online-to-offline (a reciprocal O2O model): re-patronage in an omni-channel
Liao, Shu-Hsien
Hu, Da-Chian
Liu, Hui-Ling
[J]. INTERNATIONAL JOURNAL OF RETAIL & DISTRIBUTION MANAGEMENT, 2024, 52 (03) : 341 - 354
[47] Sample Efficient Deep Reinforcement Learning With Online State Abstraction and Causal Transformer Model Prediction
Lan, Yixing
Xu, Xin
Fang, Qiang
Hao, Jianye
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (11) : 1 - 15
[48] Offline Learning of Prototypical Negatives for Efficient Online Exemplar SVM
Takami, Masato
Bell, Peter
Ommer, Bjoern
[J]. 2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 377 - 384
[49] Offline Reinforcement Learning with Pseudometric Learning
Dadashi, Robert
Rezaeifar, Shideh
Vieillard, Nino
Hussenot, Leonard
Pietquin, Olivier
Geist, Matthieu
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[50] Improving Offline Reinforcement Learning With In-Sample Advantage Regularization for Robot Manipulation
Ma, Chengzhong
Yang, Deyu
Wu, Tianyu
Liu, Zeyang
Yang, Houxue
Chen, Xingyu
Lan, Xuguang
Zheng, Nanning
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,

← 1 2 3 4 5 →