Successive Over-Relaxation Q-Learning

被引：6

作者：

Kamanchi, Chandramouli ^{[1
]}

Diddigi, Raghuram Bharadwaj ^{[1
]}

Bhatnagar, Shalabh ^{[1
,2
]}

机构：

[1] Indian Inst Sci, Dept Comp Sci & Automat, Bengaluru 560012, India

[2] Indian Inst Sci, Dept Robert Bosch Ctr Cyber Phys Syst, Bengaluru 560012, India

来源：

IEEE CONTROL SYSTEMS LETTERS | 2020年 / 4卷 / 01期

关键词：

Machine learning; stochastic optimal control; stochastic systems;

D O I：

10.1109/LCSYS.2019.2921158

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In a discounted reward Markov decision process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and a fixed point iteration scheme known as the value iteration is utilized to obtain the solution. In literature, a successive over-relaxation (SOR)-based value iteration scheme is proposed to speed-up the computation of the optimal value function. The speed-up is achieved by constructing a modified Bellman equation that ensures faster convergence to the optimal value function. However, in many practical applications, the model information is not known and we resort to reinforcement learning (RL) algorithms to obtain optimal policy and value function. One such popular algorithm is Q-learning. In this letter, we propose SOR Q-learning. We first derive a modified fixed point iteration for SOR Q-values and utilize stochastic approximation to derive a learning algorithm to compute the optimal value function and an optimal policy. We then prove the almost sure convergence of the SOR Q-learning to SOR Q-values. Finally, through numerical experiments, we show that SOR Q-learning is faster compared to the standard Q-learning algorithm.

引用

页码：55 / 60

页数：6

共 50 条

[1] NONLINEAR SUCCESSIVE OVER-RELAXATION
BREWSTER, ME
KANNAN, R
[J]. NUMERISCHE MATHEMATIK, 1984, 44 (02) : 309 - 315
[2] ESTIMATION OF SUCCESSIVE OVER-RELAXATION FACTOR
RIGLER, AK
[J]. MATHEMATICS OF COMPUTATION, 1965, 19 (90) : 302 - &
[3] Enhance Stability of Successive Over-Relaxation Method and Orthogonalized Symmetry Successive Over-Relaxation in a Larger Range of Relaxation Parameter
Liu, Chein-Shan
Chang, Chih-Wen
[J]. SYMMETRY-BASEL, 2024, 16 (07):
[4] The successive over-relaxation method in reconfigurable hardware
Kasbah, Safaa J.
Haraty, Ramzi A.
Damaj, Issarn W.
[J]. IMECS 2007: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2007, : 2395 - +
[5] ON CONVERGENCE CRITERIA FOR METHOD OF SUCCESSIVE OVER-RELAXATION
BROYDEN, CG
[J]. MATHEMATICS OF COMPUTATION, 1964, 18 (85) : 136 - &
[6] SUCCESSIVE PERIPHERAL OVER-RELAXATION AND OTHER BLOCK METHODS
BENSON, A
EVANS, DJ
[J]. JOURNAL OF COMPUTATIONAL PHYSICS, 1976, 21 (01) : 1 - 19
[7] Generating efficient parallel code for successive over-relaxation
Tang, PY
[J]. ICA(3)PP 97 - 1997 3RD INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, 1997, : 503 - 510
[8] A METHOD FOR FINDING OPTIMUM SUCCESSIVE OVER-RELAXATION PARAMETER
REID, JK
[J]. COMPUTER JOURNAL, 1966, 9 (02): : 200 - &
[9] AN ITERATIVE PROCESS FOR OPTIMIZING SYMMETRIC SUCCESSIVE OVER-RELAXATION
EVANS, DJ
FORRINGTON, CVD
[J]. COMPUTER JOURNAL, 1963, 6 (03): : 271 - &
[10] RECURSIVE SUCCESSIVE OVER-RELAXATION ALGORITHM FOR ADAPTIVE FILTERING
Hatun, Metin
Kocal, Osman Hilmi
[J]. 2012 MOSHARAKA INTERNATIONAL CONFERENCE ON COMMUNICATIONS, COMPUTERS AND APPLICATIONS (MIC-CCA), 2012, : 90 - 95

← 1 2 3 4 5 →