Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions

被引:0
|
作者
Ohashi, Kohei [1 ]
Nakanishi, Kosuke [1 ,2 ]
Yasui, Yuji [2 ]
Ishii, Shin [1 ,3 ,4 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Dept Syst Sci, Kyoto 6068501, Japan
[2] Honda Res & Dev Co Ltd, Saitama 3510113, Japan
[3] Univ Tokyo, Inst Adv Study, Int Res Ctr Neurointelligence WPI IRCN, Tokyo 1130033, Japan
[4] Adv Telecommun Res Inst Int ATR, Seika 6190288, Japan
基金
日本学术振兴会;
关键词
Deep reinforcement learning; adversarial example; robustness; regularization;
D O I
10.1109/ACCESS.2023.3314750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the last decade, methods for autonomous control by artificial intelligence have been extensively developed based on deep reinforcement learning (DRL) technologies. However, despite these advances, robustness to noise in observation data remains as an issue in autonomous control policies implemented using DRL in practical applications. In this study, we present a general robust adversarial learning technology applicable to DRL. During these adversarial learning processes, policies are trained to output consistent control actions through regularization learning, even for adversarial input examples. Importantly, these adversarial examples are produced to lead the current policy to predict the worst action at each state. Although a naive implementation of regularization learning may cause DRL model to learn a biased objective function, our methods were found to minimize bias. When implemented as a modification of a deep Q-network for discrete-action problems in Atari 2600 games and of a deep deterministic policy gradient for continuous-action tasks in Pybullet, our new adversarial learning frameworks showed significantly enhanced robustness against adversarial and random noise added to the input compared to several recently proposed methods.
引用
收藏
页码:100798 / 100809
页数:12
相关论文
共 18 条
  • [1] Continuous-time reinforcement learning for robust control under worst-case uncertainty
    Perrusquia, Adolfo
    Yu, Wen
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2021, 52 (04) : 770 - 784
  • [2] Robust control under worst-case uncertainty for unknown nonlinear systems using modified reinforcement learning
    Perrusquia, Adolfo
    Yu, Wen
    [J]. INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2020, 30 (07) : 2920 - 2936
  • [3] Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations
    Zhang, Huan
    Chen, Hongge
    Xiao, Chaowei
    Li, Bo
    Liu, Mingyan
    Boning, Duane
    Hsieh, Cho-Jui
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] ACADIA: Efficient and Robust Adversarial Attacks Against Deep Reinforcement Learning
    Ali, Haider
    Al Ameedi, Mohannad
    Swami, Ananthram
    Ning, Rui
    Li, Jiang
    Wu, Hongyi
    Cho, Jin-Hee
    [J]. 2022 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2022, : 1 - 9
  • [5] Trustworthy autonomous driving via defense-aware robust reinforcement learning against worst-case observational perturbations
    He, Xiangkun
    Huang, Wenhui
    Lv, Chen
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2024, 163
  • [6] Robust Control in the Worst Case Using Continuous Time Reinforcement Learning
    Perrusquia, Adolfo
    Yu, Wen
    Li, Xiaoou
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1951 - 1954
  • [7] Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning
    Liang, Yongyuan
    Sun, Yanchao
    Zheng, Ruijie
    Huang, Furong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Real-Time Adversarial Perturbations Against Deep Reinforcement Learning Policies: Attacks and Defenses
    Tekgul, Buse G. A.
    Wang, Shelly
    Marchal, Samuel
    Asokan, N.
    [J]. COMPUTER SECURITY - ESORICS 2022, PT III, 2022, 13556 : 384 - 404
  • [9] RL-Based Method for Benchmarking the Adversarial Resilience and Robustness of Deep Reinforcement Learning Policies
    Behzadan, Vahid
    Hsu, William
    [J]. COMPUTER SAFETY, RELIABILITY, AND SECURITY, SAFECOMP 2019, 2019, 11699 : 314 - 325
  • [10] Urban traffic signal control robust optimization against Risk-averse and Worst-case cyberattacks
    Zheng, Liang
    Bao, Ji
    Mei, Zhenyu
    [J]. INFORMATION SCIENCES, 2023, 640