Deep Q-learning: A robust control approach

被引:9
|
作者
Varga, Balazs [1 ]
Kulcsar, Balazs [1 ]
Chehreghani, Morteza Haghir [2 ]
机构
[1] Chalmers Univ Technol, Dept Elect Engn, Horsalsvagen 11, Gothenburg, Sweden
[2] Chalmers Univ Technol, Dept Comp Sci & Engn, Gothenburg, Sweden
关键词
controlled learning; deep Q-learning; neural tangent kernel; robust control;
D O I
10.1002/rnc.6457
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work aims at constructing a bridge between robust control theory and reinforcement learning. Although, reinforcement learning has shown admirable results in complex control tasks, the agent's learning behavior is opaque. Meanwhile, system theory has several tools for analyzing and controlling dynamical systems. This article places deep Q-learning is into a control-oriented perspective to study its learning dynamics with well-established techniques from robust control. An uncertain linear time-invariant model is formulated by means of the neural tangent kernel to describe learning. This novel approach allows giving conditions for stability (convergence) of the learning and enables the analysis of the agent's behavior in frequency-domain. The control-oriented approach makes it possible to formulate robust controllers that inject dynamical rewards as control input in the loss function to achieve better convergence properties. Three output-feedback controllers are synthesized: gain scheduling Script capital H2$$ {\mathscr{H}}_2 $$, dynamical Script capital H infinity$$ {\mathscr{H}}_{\infty } $$, and fixed-structure Script capital H infinity$$ {\mathscr{H}}_{\infty } $$ controllers. Compared to traditional deep Q-learning techniques, which involve several heuristics, setting up the learning agent with a control-oriented tuning methodology is more transparent and has well-established literature. The proposed approach does not use a target network and randomized replay memory. The role of the target network is overtaken by the control input, which also exploits the temporal dependency of samples (opposed to a randomized memory buffer). Numerical simulations in different OpenAI Gym environments suggest that the Script capital H infinity$$ {\mathscr{H}}_{\infty } $$ controlled learning can converge faster and receive higher scores (depending on the environment) compared to the benchmark double deep Q-learning.
引用
收藏
页码:526 / 544
页数:19
相关论文
共 50 条
  • [41] Robust diagnostic classification via Q-learning
    Victor Ardulov
    Victor R. Martinez
    Krishna Somandepalli
    Shuting Zheng
    Emma Salzman
    Catherine Lord
    Somer Bishop
    Shrikanth Narayanan
    [J]. Scientific Reports, 11
  • [42] Robust diagnostic classification via Q-learning
    Ardulov, Victor
    Martinez, Victor R.
    Somandepalli, Krishna
    Zheng, Shuting
    Salzman, Emma
    Lord, Catherine
    Bishop, Somer
    Narayanan, Shrikanth
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)
  • [43] On-Off Adversarially Robust Q-Learning
    Sahoo, Prachi Pratyusha
    Vamvoudakis, Kyriakos G.
    [J]. IEEE CONTROL SYSTEMS LETTERS, 2020, 4 (03): : 749 - 754
  • [44] Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system
    Ben Hazem, Zied
    [J]. DISCOVER APPLIED SCIENCES, 2024, 6 (02)
  • [45] AIOC2: A deep Q-learning approach to autonomic I/O congestion control in Lustre
    Cheng, Wen
    Deng, Shijun
    Zeng, Lingfang
    Wang, Yang
    Brinkmann, Andre
    [J]. PARALLEL COMPUTING, 2021, 108
  • [46] A novel deep learning driven robot path planning strategy: Q-learning approach
    Hu, Junli
    [J]. INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2023, 71 (03) : 237 - 243
  • [47] Experience-Based Heuristic Search: Robust Motion Planning with Deep Q-Learning
    Bernhard, Julian
    Gieselmann, Robert
    Esterle, Klemens
    Knoll, Alois
    [J]. 2018 21ST INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2018, : 3175 - 3182
  • [48] LEARNING HOSE TRANSPORT CONTROL WITH Q-LEARNING
    Fernandez-Gauna, Borja
    Manuel Lopez-Guede, Jose
    Zulueta, Ekaitz
    Grana, Manuel
    [J]. NEURAL NETWORK WORLD, 2010, 20 (07) : 913 - 923
  • [49] A Q-learning approach to attribute reduction
    Yuxin Liu
    Zhice Gong
    Keyu Liu
    Suping Xu
    Hengrong Ju
    Xibei Yang
    [J]. Applied Intelligence, 2023, 53 : 3750 - 3765
  • [50] A Q-learning approach to attribute reduction
    Liu, Yuxin
    Gong, Zhice
    Liu, Keyu
    Xu, Suping
    Ju, Hengrong
    Yang, Xibei
    [J]. APPLIED INTELLIGENCE, 2023, 53 (04) : 3750 - 3765