Deep Q-learning: A robust control approach

被引:9
|
作者
Varga, Balazs [1 ]
Kulcsar, Balazs [1 ]
Chehreghani, Morteza Haghir [2 ]
机构
[1] Chalmers Univ Technol, Dept Elect Engn, Horsalsvagen 11, Gothenburg, Sweden
[2] Chalmers Univ Technol, Dept Comp Sci & Engn, Gothenburg, Sweden
关键词
controlled learning; deep Q-learning; neural tangent kernel; robust control;
D O I
10.1002/rnc.6457
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work aims at constructing a bridge between robust control theory and reinforcement learning. Although, reinforcement learning has shown admirable results in complex control tasks, the agent's learning behavior is opaque. Meanwhile, system theory has several tools for analyzing and controlling dynamical systems. This article places deep Q-learning is into a control-oriented perspective to study its learning dynamics with well-established techniques from robust control. An uncertain linear time-invariant model is formulated by means of the neural tangent kernel to describe learning. This novel approach allows giving conditions for stability (convergence) of the learning and enables the analysis of the agent's behavior in frequency-domain. The control-oriented approach makes it possible to formulate robust controllers that inject dynamical rewards as control input in the loss function to achieve better convergence properties. Three output-feedback controllers are synthesized: gain scheduling Script capital H2$$ {\mathscr{H}}_2 $$, dynamical Script capital H infinity$$ {\mathscr{H}}_{\infty } $$, and fixed-structure Script capital H infinity$$ {\mathscr{H}}_{\infty } $$ controllers. Compared to traditional deep Q-learning techniques, which involve several heuristics, setting up the learning agent with a control-oriented tuning methodology is more transparent and has well-established literature. The proposed approach does not use a target network and randomized replay memory. The role of the target network is overtaken by the control input, which also exploits the temporal dependency of samples (opposed to a randomized memory buffer). Numerical simulations in different OpenAI Gym environments suggest that the Script capital H infinity$$ {\mathscr{H}}_{\infty } $$ controlled learning can converge faster and receive higher scores (depending on the environment) compared to the benchmark double deep Q-learning.
引用
收藏
页码:526 / 544
页数:19
相关论文
共 50 条
  • [1] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [2] Split Deep Q-Learning for Robust Object Singulation
    Sarantopoulos, Iason
    Kiatos, Marios
    Doulgeri, Zoe
    Malassiotis, Sotiris
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6225 - 6231
  • [3] Robust Q-Learning
    Ertefaie, Ashkan
    McKay, James R.
    Oslin, David
    Strawderman, Robert L.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 368 - 381
  • [4] Deep Q-learning Approach based on CNN and XGBoost for Traffic Signal Control
    Faqir, Nada
    Loqman, Chakir
    Boumhidi, Jaouad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 529 - 536
  • [5] Making Deep Q-learning Methods Robust to Time Discretization
    Tallec, Corentin
    Blier, Leonard
    Ollivier, Yann
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [6] Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
    Xu, Zhi-xiong
    Cao, Lei
    Chen, Xi-liang
    Li, Chen-xi
    Zhang, Yong-liang
    Lai, Jun
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2315 - 2322
  • [7] Distributionally Robust Q-Learning
    Liu, Zijian
    Bai, Qinxun
    Blanchet, Jose
    Dong, Perry
    Xu, Wei
    Zhou, Zhengqing
    Zhou, Zhengyuan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [8] A Deep Q-Learning Approach for GPU Task Scheduling
    Luley, Ryan S.
    Qiu, Qinru
    [J]. 2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [9] Deep Spatial Q-Learning for Infectious Disease Control
    Zhishuai Liu
    Jesse Clifton
    Eric B. Laber
    John Drake
    Ethan X. Fang
    [J]. Journal of Agricultural, Biological and Environmental Statistics, 2023, 28 : 749 - 773
  • [10] Deep Spatial Q-Learning for Infectious Disease Control
    Liu, Zhishuai
    Clifton, Jesse
    Laber, Eric B.
    Drake, John
    Fang, Ethan X.
    [J]. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2023, 28 (04) : 749 - 773