Improving the Performance of Batch-Constrained Reinforcement Learning in Continuous Action Domains via Generative Adversarial Networks

被引:0
|
作者
Saglam, Baturay [1 ]
Dalmaz, Onat [1 ]
Gonc, Kaan [2 ]
Kozat, Suleyman S. [1 ]
机构
[1] Bilkent Univ, Elekt & Elekt Muhendisligi Bolumu, Ankara, Turkey
[2] Bilkent Univ, Bilgisayar Muhendisligi Bolumu, Ankara, Turkey
来源
2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU | 2022年
关键词
deep reinforcement learning; batch-constrained reinforcement learning; offline reinforcement learning;
D O I
10.1109/SIU55565.2022.9864786
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a previously collected fixed batch of transitions. However, due to conditional Variational Autoencoders (VAE) used in the data generation module, the BCQ algorithm optimizes a lower variational bound and hence, it is not generalizable to environments with large state and action spaces. In this paper, we show that the performance of the BCQ algorithm can be further improved with the employment of one of the recent advances in deep learning, Generative Adversarial Networks. Our extensive set of experiments shows that the introduced approach significantly improves BCQ in all of the control tasks tested. Moreover, the introduced approach demonstrates robust generalizability to environments with large state and action spaces in the OpenAI Gym control suite.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Intrusion Detection Based on Generative Adversarial Network of Reinforcement Learning Strategy for Wireless Sensor Networks
    Tu J.
    Ogola W.
    Xu D.
    Xie W.
    International Journal of Circuits, Systems and Signal Processing, 2022, 16 : 478 - 482
  • [42] Improving Prediction Accuracy in Building Performance Models Using Generative Adversarial Networks (GANs)
    Chokwitthaya, Chanachok
    Collier, Edward
    Zhu, Yimin
    Mukhopadhyay, Supratik
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [43] On improving the performance of glitch classification for gravitational wave detection by using Generative Adversarial Networks
    Yan, Jianqi
    Leung, Alex P.
    Hui, C. Y.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2022, 515 (03) : 4606 - 4621
  • [44] Reinforcement bond performance in 3D concrete printing: Explainable ensemble learning augmented by deep generative adversarial networks
    Wang, Xianlin
    Banthia, Nemkumar
    Yoo, Doo-Yeol
    AUTOMATION IN CONSTRUCTION, 2024, 158
  • [45] Orthogonal Adversarial Deep Reinforcement Learning for Discrete- and Continuous-Action Problems
    Ohashi, Kohei
    Nakanishi, Kosuke
    Goto, Nao
    Yasui, Yuji
    Ishii, Shin
    IEEE ACCESS, 2024, 12 : 151907 - 151919
  • [46] Convergent Reinforcement Learning Control with Neural Networks and Continuous Action Search
    Lee, Minwoo
    Anderson, Charles W.
    2014 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2014, : 33 - 40
  • [47] Improving Astronomical Time-series Classification via Data Augmentation with Generative Adversarial Networks
    Garcia-Jara, German
    Protopapas, Pavlos
    Estevez, Pablo A.
    ASTROPHYSICAL JOURNAL, 2022, 935 (01):
  • [48] Improving Reinforcement Learning control via online bilinear action interpolation
    Ribeiro, CHC
    Hemerly, EM
    VTH BRAZILIAN SYMPOSIUM ON NEURAL NETWORKS, PROCEEDINGS, 1998, : 102 - 105
  • [49] Machinery Health Monitoring Based on Unsupervised Feature Learning via Generative Adversarial Networks
    Dai, Jun
    Wang, Jun
    Huang, Weiguo
    Shi, Juanjuan
    Zhu, Zhongkui
    IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2020, 25 (05) : 2252 - 2263
  • [50] Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning
    Wang, Haozhe
    Du, Chao
    Pang, Panyan
    He, Li
    Wang, Liang
    Zheng, Bo
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2314 - 2325