Boundary Data Augmentation for Offline Reinforcement Learning

被引：0

作者：

SHEN Jiahao ^{[1
,2
]}

JIANG Ke ^{[1
,2
]}

TAN Xiaoyang ^{[1
,2
]}

机构：

[1] College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics

[2] MIIT Key Laboratory of Pattern Analysis and Machine Intelligence

来源：

ZTE Communications | 2023年 / 21卷 / 03期

基金：

国家重点研发计划; 美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP181 [自动推理、机器学习];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning(ORL) aims to learn a rational agent purely from behavior data without any online interaction. One of the major challenges encountered in ORL is the problem of distribution shift, i.e., the mismatch between the knowledge of the learned policy and the reality of the underlying environment. Recent works usually handle this in a too pessimistic manner to avoid out-of-distribution(OOD) queries as much as possible, but this can influence the robustness of the agents at unseen states. In this paper, we propose a simple but effective method to address this issue. The key idea of our method is to enhance the robustness of the new policy learned offline by weakening its confidence in highly uncertain regions, and we propose to find those regions by simulating them with modified Generative Adversarial Nets(GAN) such that the generated data not only follow the same distribution with the old experience but are very difficult to deal with by themselves, with regard to the behavior policy or some other reference policy. We then use this information to regularize the ORL algorithm to penalize the overconfidence behavior in these regions. Extensive experiments on several publicly available offline RL benchmarks demonstrate the feasibility and effectiveness of the proposed method.

引用

页码：29 / 36

页数：8

共 50 条

[1] Selective Data Augmentation for Improving the Performance of Offline Reinforcement Learning
Han, Jungwoo
Kim, Jinwhan
[J]. 2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 222 - 226
[2] Uncertainty-Aware Data Augmentation for Offline Reinforcement Learning
Su, Yunjie
Kong, Yilun
Wang, Xueqian
[J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[3] A Swapping Target Q-Value Technique for Data Augmentation in Offline Reinforcement Learning
Joo, Ho-Taek
Baek, In-Chang
Kim, Kyung-Joong
[J]. IEEE ACCESS, 2022, 10 : 57369 - 57382
[4] Robust Reinforcement Learning using Offline Data
Panaganti, Kishan
Xu, Zaiyan
Kalathil, Dileep
Ghavamzadeh, Mohammad
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[5] Federated Offline Reinforcement Learning With Multimodal Data
Wen, Jiabao
Dai, Huiao
He, Jingyi
Xi, Meng
Xiao, Shuai
Yang, Jiachen
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 4266 - 4276
[6] Efficient Online Reinforcement Learning with Offline Data
Ball, Philip J.
Smith, Laura
Kostrikov, Ilya
Levine, Sergey
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[7] K-mixup: Data augmentation for offline reinforcement learning using mixup in a Koopman invariant subspace
Jang, Junwoo
Han, Jungwoo
Kim, Jinwhan
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 225
[8] How to Leverage Unlabeled Data in Offline Reinforcement Learning
Yu, Tianhe
Kumar, Aviral
Chebotar, Yevgen
Hausman, Karol
Finn, Chelsea
Levine, Sergey
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[9] S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning
Cho, Daesol
Shim, Dongseok
Kim, H. Jin
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[10] State Augmentation via Self-Supervision in Offline Multiagent Reinforcement Learning
Wang, Siying
Li, Xiaodie
Qu, Hong
Chen, Wenyu
[J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (03) : 1051 - 1062

← 1 2 3 4 5 →