Recursive Least-Squares Temporal Difference With Gradient Correction

被引：1

作者：

Song, Tianheng ^{[1
]}

Li, Dazi ^{[1
]}

Yang, Weimin ^{[2
]}

Hirasawa, Kotaro ^{[1
]}

机构：

[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Dept Automat, Beijing 100029, Peoples R China

[2] Beijing Univ Chem Technol, Coll Mech & Elect Engn, Dept Mech Engn, Beijing 100029, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2021年 / 51卷 / 08期

基金：

北京市自然科学基金; 中国博士后科学基金; 中国国家自然科学基金;

关键词：

Policy evaluation; reinforcement learning (RL); temporal differences (TDs); value function approximation; POLICY ITERATION; APPROXIMATION;

D O I：

10.1109/TCYB.2019.2902342

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Since the late 1980s, temporal difference (TD) learning has dominated the research area of policy evaluation algorithms. However, the demand for the avoidance of TD defects, such as low data-efficiency and divergence in off-policy learning, has inspired the studies of a large number of novel TD-based approaches. Gradient-based and least-squares-based algorithms comprise the major part of these new approaches. This paper aims to combine advantages of these two categories to derive an efficient policy evaluation algorithm with O(n(2)) per-time-step runtime complexity. The least-squares-based framework is adopted, and the gradient correction is used to improve convergence performance. This paper begins with the revision of a previous O(n(3)) batch algorithm, least-squares TD with a gradient correction (LS-TDC) to regularize the parameter vector. Based on the recursive least-squares technique, an O(n(2)) counterpart of LS-TDC called RC is proposed. To increase data efficiency, we generalize RC with eligibility traces. An off-policy extension is also proposed based on importance sampling. In addition, the convergence analysis for RC as well as LS-TDC is given. The empirical results in both on-policy and off-policy benchmarks show that RC has a higher estimation accuracy than that of RLSTD and a significantly lower runtime complexity than that of LSTDC.

引用

页码：4251 / 4264

页数：14

共 50 条

[1] Multikernel Recursive Least-Squares Temporal Difference Learning
Zhang, Chunyuan
Zhu, Qingxin
Niu, Xinzheng
[J]. INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2016, PT III, 2016, 9773 : 205 - 217
[2] An Adaptive Policy Evaluation Network Based on Recursive Least Squares Temporal Difference With Gradient Correction
Li, Dazi
Wang, Yuting
Song, Tianheng
Jin, Qibing
[J]. IEEE ACCESS, 2018, 6 : 7515 - 7525
[3] Kernel Recursive Least-Squares Temporal Difference Algorithms with Sparsification and Regularization
Zhang, Chunyuan
Zhu, Qingxin
Niu, Xinzheng
[J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2016, 2016
[4] Least-Squares temporal difference learning
Boyan, JA
[J]. MACHINE LEARNING, PROCEEDINGS, 1999, : 49 - 56
[5] Orthogonal Matching Pursuit for Least Squares Temporal Difference with Gradient Correction
Li, Dazi
Ma, Chao
Zhang, Jianqing
Ma, Xin
Jin, Qibing
[J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 4108 - 4112
[6] Kernel-Based Least Squares Temporal Difference With Gradient Correction
Song, Tianheng
Li, Dazi
Cao, Liulin
Hirasawa, Kotaro
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (04) : 771 - 782
[7] Regularization and Feature Selection in Least Squares Temporal Difference with Gradient Correction
Li, Dazi
Li, Luntong
Song, Tianheng
Jin, Qibing
[J]. PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 2289 - 2293
[8] Recursive least-squares temporal difference learning for adaptive traffic signal control at intersection
Biao Yin
Mahjoub Dridi
Abdellah El Moudni
[J]. Neural Computing and Applications, 2019, 31 : 1013 - 1028
[9] Recursive least-squares temporal difference learning for adaptive traffic signal control at intersection
Yin, Biao
Dridi, Mahjoub
El Moudni, Abdellah
[J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (Suppl 2): : 1013 - 1028
[10] On the recursive total least-squares
Pham, C
Ogunfunmi, T
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1989 - 1992

← 1 2 3 4 5 →