Flexible Data Augmentation in Off-Policy Reinforcement Learning

被引：0

作者：

Rak, Alexandra ^{[1
]}

Skrynnik, Alexey ^{[1
,2
]}

Panov, Aleksandr I. ^{[1
,2
]}

机构：

[1] Moscow Inst Phys & Technol, Moscow, Russia

[2] Russian Acad Sci, Fed Res Ctr Comp Sci & Control, Moscow, Russia

来源：

ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT I | 2021年 / 12854卷

关键词：

Reinforcement learning; Image augmentation; Rainbow; Regularization;

D O I：

10.1007/978-3-030-87986-0_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper explores an application of image augmentation in reinforcement learning tasks - a popular regularization technique in the computer vision area. The analysis is based on the model-free off-policy algorithms. As a regularization, we consider the augmentation of the frames that are sampled from the replay buffer of the model. Evaluated augmentation techniques are random changes in image contrast, random shifting, random cutting, and others. Research is done using the environments of the Atari games: Breakout, Space Invaders, Berzerk, Wizard of Wor, Demon Attack. Using augmentations allowed us to obtain results confirming the significant acceleration of the model's algorithm convergence. We also proposed an adaptive mechanism for selecting the type of augmentation depending on the type of task being performed by the agent.

引用

页码：224 / 235

页数：12

共 50 条

[1] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Thomas, Philip S.
Brunskill, Emma
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[2] Safe and efficient off-policy reinforcement learning
Munos, Remi
Stepleton, Thomas
Harutyunyan, Anna
Bellemare, Marc G.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[3] Bounds for Off-policy Prediction in Reinforcement Learning
Joseph, Ajin George
Bhatnagar, Shalabh
[J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3991 - 3997
[4] Off-Policy Reinforcement Learning with Gaussian Processes
Girish Chowdhary
Miao Liu
Robert Grande
Thomas Walsh
Jonathan How
Lawrence Carin
[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1 (03) : 227 - 238
[5] Off-Policy Reinforcement Learning with Delayed Rewards
Han, Beining
Ren, Zhizhou
Wu, Zuofan
Zhou, Yuan
Peng, Jian
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[6] A perspective on off-policy evaluation in reinforcement learning
Li, Lihong
[J]. FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
[7] A perspective on off-policy evaluation in reinforcement learning
Lihong Li
[J]. Frontiers of Computer Science, 2019, 13 : 911 - 912
[8] Representations for Stable Off-Policy Reinforcement Learning
Ghosh, Dibya
Bellemare, Marc G.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[9] On the Reuse Bias in Off-Policy Reinforcement Learning
Ying, Chengyang
Hao, Zhongkai
Zhou, Xinning
Su, Hang
Yan, Dong
Zhu, Jun
[J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4513 - 4521
[10] Reliable Off-Policy Evaluation for Reinforcement Learning
Wang, Jie
Gao, Rui
Zha, Hongyuan
[J]. OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716

← 1 2 3 4 5 →