Flexible Data Augmentation in Off-Policy Reinforcement Learning

被引:0
|
作者
Rak, Alexandra [1 ]
Skrynnik, Alexey [1 ,2 ]
Panov, Aleksandr I. [1 ,2 ]
机构
[1] Moscow Inst Phys & Technol, Moscow, Russia
[2] Russian Acad Sci, Fed Res Ctr Comp Sci & Control, Moscow, Russia
关键词
Reinforcement learning; Image augmentation; Rainbow; Regularization;
D O I
10.1007/978-3-030-87986-0_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper explores an application of image augmentation in reinforcement learning tasks - a popular regularization technique in the computer vision area. The analysis is based on the model-free off-policy algorithms. As a regularization, we consider the augmentation of the frames that are sampled from the replay buffer of the model. Evaluated augmentation techniques are random changes in image contrast, random shifting, random cutting, and others. Research is done using the environments of the Atari games: Breakout, Space Invaders, Berzerk, Wizard of Wor, Demon Attack. Using augmentations allowed us to obtain results confirming the significant acceleration of the model's algorithm convergence. We also proposed an adaptive mechanism for selecting the type of augmentation depending on the type of task being performed by the agent.
引用
收藏
页码:224 / 235
页数:12
相关论文
共 50 条
  • [1] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] Safe and efficient off-policy reinforcement learning
    Munos, Remi
    Stepleton, Thomas
    Harutyunyan, Anna
    Bellemare, Marc G.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [3] Bounds for Off-policy Prediction in Reinforcement Learning
    Joseph, Ajin George
    Bhatnagar, Shalabh
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3991 - 3997
  • [4] Off-Policy Reinforcement Learning with Gaussian Processes
    Girish Chowdhary
    Miao Liu
    Robert Grande
    Thomas Walsh
    Jonathan How
    Lawrence Carin
    [J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1 (03) : 227 - 238
  • [5] Off-Policy Reinforcement Learning with Delayed Rewards
    Han, Beining
    Ren, Zhizhou
    Wu, Zuofan
    Zhou, Yuan
    Peng, Jian
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [6] A perspective on off-policy evaluation in reinforcement learning
    Li, Lihong
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
  • [7] A perspective on off-policy evaluation in reinforcement learning
    Lihong Li
    [J]. Frontiers of Computer Science, 2019, 13 : 911 - 912
  • [8] Representations for Stable Off-Policy Reinforcement Learning
    Ghosh, Dibya
    Bellemare, Marc G.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [9] On the Reuse Bias in Off-Policy Reinforcement Learning
    Ying, Chengyang
    Hao, Zhongkai
    Zhou, Xinning
    Su, Hang
    Yan, Dong
    Zhu, Jun
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4513 - 4521
  • [10] Reliable Off-Policy Evaluation for Reinforcement Learning
    Wang, Jie
    Gao, Rui
    Zha, Hongyuan
    [J]. OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716