Multiscale Q-learning with linear function approximation

被引：0

作者：

Shalabh Bhatnagar

K. Lakshmanan

机构：

[1] Indian Institute of Science,Department of Computer Science and Automation

[2] National University of Singapore,Department of Mechanical Engineering

来源：

Discrete Event Dynamic Systems | 2016年 / 26卷

关键词：

Q-learning with linear function approximation; Reinforcement learning; Stochastic approximation; Ordinary differential equation; Differential inclusion; Multi-stage Stochastic shortest path problem;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We present in this article a two-timescale variant of Q-learning with linear function approximation. Both Q-values and policies are assumed to be parameterized with the policy parameter updated on a faster timescale as compared to the Q-value parameter. This timescale separation is seen to result in significantly improved numerical performance of the proposed algorithm over Q-learning. We show that the proposed algorithm converges almost surely to a closed connected internally chain transitive invariant set of an associated differential inclusion.

引用

页码：477 / 509

页数：32

共 50 条

[21] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[22] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[23] Multi-Agent Q-Learning with Joint State Value Approximation
Chen Gang
Cao Weihua
Chen Xin
Wu Min
2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 4878 - 4882
[24] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
Ohnishi, Shota
Uchibe, Eiji
Yamaguchi, Yotaro
Nakanishi, Kosuke
Yasui, Yuji
Ishii, Shin
FRONTIERS IN NEUROROBOTICS, 2019, 13
[25] Non-linear control based on Q-learning algorithms
Yang, Dong
Yin, Chang-Ming
Chen, Huan-Wen
Wu, Bo-Sen
Changsha Dianli Xueyuan Xuebao/Journal of Changsha University of Electric Power, 2003, 18 (01):
[26] Safe Q-learning for continuous-time linear systems
Bandyopadhyay, Soutrik
Bhasin, Shubhendu
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 241 - 246
[27] Learning rates for Q-Learning
Even-Dar, E
Mansour, Y
COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
[28] Learning rates for Q-learning
Even-Dar, E
Mansour, Y
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
[29] Optimal Control Inspired Q-Learning for Switched Linear Systems
Chen, Hua
Zheng, Linfang
Zhang, Wei
2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 4003 - 4010
[30] The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond
Woo, Jiin
Joshi, Gauri
Chi, Yuejie
JOURNAL OF MACHINE LEARNING RESEARCH, 2025, 26 : 1 - 85

← 1 2 3 4 5 →