Multiscale Q-learning with linear function approximation

被引:0
|
作者
Shalabh Bhatnagar
K. Lakshmanan
机构
[1] Indian Institute of Science,Department of Computer Science and Automation
[2] National University of Singapore,Department of Mechanical Engineering
来源
Discrete Event Dynamic Systems | 2016年 / 26卷
关键词
Q-learning with linear function approximation; Reinforcement learning; Stochastic approximation; Ordinary differential equation; Differential inclusion; Multi-stage Stochastic shortest path problem;
D O I
暂无
中图分类号
学科分类号
摘要
We present in this article a two-timescale variant of Q-learning with linear function approximation. Both Q-values and policies are assumed to be parameterized with the policy parameter updated on a faster timescale as compared to the Q-value parameter. This timescale separation is seen to result in significantly improved numerical performance of the proposed algorithm over Q-learning. We show that the proposed algorithm converges almost surely to a closed connected internally chain transitive invariant set of an associated differential inclusion.
引用
收藏
页码:477 / 509
页数:32
相关论文
共 50 条
  • [21] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [22] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [23] Multi-Agent Q-Learning with Joint State Value Approximation
    Chen Gang
    Cao Weihua
    Chen Xin
    Wu Min
    2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 4878 - 4882
  • [24] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
    Ohnishi, Shota
    Uchibe, Eiji
    Yamaguchi, Yotaro
    Nakanishi, Kosuke
    Yasui, Yuji
    Ishii, Shin
    FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [25] Non-linear control based on Q-learning algorithms
    Yang, Dong
    Yin, Chang-Ming
    Chen, Huan-Wen
    Wu, Bo-Sen
    Changsha Dianli Xueyuan Xuebao/Journal of Changsha University of Electric Power, 2003, 18 (01):
  • [26] Safe Q-learning for continuous-time linear systems
    Bandyopadhyay, Soutrik
    Bhasin, Shubhendu
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 241 - 246
  • [27] Learning rates for Q-Learning
    Even-Dar, E
    Mansour, Y
    COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
  • [28] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
  • [29] Optimal Control Inspired Q-Learning for Switched Linear Systems
    Chen, Hua
    Zheng, Linfang
    Zhang, Wei
    2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 4003 - 4010
  • [30] The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond
    Woo, Jiin
    Joshi, Gauri
    Chi, Yuejie
    JOURNAL OF MACHINE LEARNING RESEARCH, 2025, 26 : 1 - 85