Recursive Least-Squares Temporal Difference With Gradient Correction

被引:1
|
作者
Song, Tianheng [1 ]
Li, Dazi [1 ]
Yang, Weimin [2 ]
Hirasawa, Kotaro [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Dept Automat, Beijing 100029, Peoples R China
[2] Beijing Univ Chem Technol, Coll Mech & Elect Engn, Dept Mech Engn, Beijing 100029, Peoples R China
基金
北京市自然科学基金; 中国博士后科学基金; 中国国家自然科学基金;
关键词
Policy evaluation; reinforcement learning (RL); temporal differences (TDs); value function approximation; POLICY ITERATION; APPROXIMATION;
D O I
10.1109/TCYB.2019.2902342
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since the late 1980s, temporal difference (TD) learning has dominated the research area of policy evaluation algorithms. However, the demand for the avoidance of TD defects, such as low data-efficiency and divergence in off-policy learning, has inspired the studies of a large number of novel TD-based approaches. Gradient-based and least-squares-based algorithms comprise the major part of these new approaches. This paper aims to combine advantages of these two categories to derive an efficient policy evaluation algorithm with O(n(2)) per-time-step runtime complexity. The least-squares-based framework is adopted, and the gradient correction is used to improve convergence performance. This paper begins with the revision of a previous O(n(3)) batch algorithm, least-squares TD with a gradient correction (LS-TDC) to regularize the parameter vector. Based on the recursive least-squares technique, an O(n(2)) counterpart of LS-TDC called RC is proposed. To increase data efficiency, we generalize RC with eligibility traces. An off-policy extension is also proposed based on importance sampling. In addition, the convergence analysis for RC as well as LS-TDC is given. The empirical results in both on-policy and off-policy benchmarks show that RC has a higher estimation accuracy than that of RLSTD and a significantly lower runtime complexity than that of LSTDC.
引用
收藏
页码:4251 / 4264
页数:14
相关论文
共 50 条
  • [1] Multikernel Recursive Least-Squares Temporal Difference Learning
    Zhang, Chunyuan
    Zhu, Qingxin
    Niu, Xinzheng
    [J]. INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2016, PT III, 2016, 9773 : 205 - 217
  • [2] An Adaptive Policy Evaluation Network Based on Recursive Least Squares Temporal Difference With Gradient Correction
    Li, Dazi
    Wang, Yuting
    Song, Tianheng
    Jin, Qibing
    [J]. IEEE ACCESS, 2018, 6 : 7515 - 7525
  • [3] Kernel Recursive Least-Squares Temporal Difference Algorithms with Sparsification and Regularization
    Zhang, Chunyuan
    Zhu, Qingxin
    Niu, Xinzheng
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2016, 2016
  • [4] Least-Squares temporal difference learning
    Boyan, JA
    [J]. MACHINE LEARNING, PROCEEDINGS, 1999, : 49 - 56
  • [5] Orthogonal Matching Pursuit for Least Squares Temporal Difference with Gradient Correction
    Li, Dazi
    Ma, Chao
    Zhang, Jianqing
    Ma, Xin
    Jin, Qibing
    [J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 4108 - 4112
  • [6] Kernel-Based Least Squares Temporal Difference With Gradient Correction
    Song, Tianheng
    Li, Dazi
    Cao, Liulin
    Hirasawa, Kotaro
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (04) : 771 - 782
  • [7] Regularization and Feature Selection in Least Squares Temporal Difference with Gradient Correction
    Li, Dazi
    Li, Luntong
    Song, Tianheng
    Jin, Qibing
    [J]. PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 2289 - 2293
  • [8] Recursive least-squares temporal difference learning for adaptive traffic signal control at intersection
    Biao Yin
    Mahjoub Dridi
    Abdellah El Moudni
    [J]. Neural Computing and Applications, 2019, 31 : 1013 - 1028
  • [9] Recursive least-squares temporal difference learning for adaptive traffic signal control at intersection
    Yin, Biao
    Dridi, Mahjoub
    El Moudni, Abdellah
    [J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (Suppl 2): : 1013 - 1028
  • [10] On the recursive total least-squares
    Pham, C
    Ogunfunmi, T
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1989 - 1992