Recursive Least-Squares Temporal Difference With Gradient Correction

被引:1
|
作者
Song, Tianheng [1 ]
Li, Dazi [1 ]
Yang, Weimin [2 ]
Hirasawa, Kotaro [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Dept Automat, Beijing 100029, Peoples R China
[2] Beijing Univ Chem Technol, Coll Mech & Elect Engn, Dept Mech Engn, Beijing 100029, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金; 中国博士后科学基金;
关键词
Policy evaluation; reinforcement learning (RL); temporal differences (TDs); value function approximation; POLICY ITERATION; APPROXIMATION;
D O I
10.1109/TCYB.2019.2902342
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since the late 1980s, temporal difference (TD) learning has dominated the research area of policy evaluation algorithms. However, the demand for the avoidance of TD defects, such as low data-efficiency and divergence in off-policy learning, has inspired the studies of a large number of novel TD-based approaches. Gradient-based and least-squares-based algorithms comprise the major part of these new approaches. This paper aims to combine advantages of these two categories to derive an efficient policy evaluation algorithm with O(n(2)) per-time-step runtime complexity. The least-squares-based framework is adopted, and the gradient correction is used to improve convergence performance. This paper begins with the revision of a previous O(n(3)) batch algorithm, least-squares TD with a gradient correction (LS-TDC) to regularize the parameter vector. Based on the recursive least-squares technique, an O(n(2)) counterpart of LS-TDC called RC is proposed. To increase data efficiency, we generalize RC with eligibility traces. An off-policy extension is also proposed based on importance sampling. In addition, the convergence analysis for RC as well as LS-TDC is given. The empirical results in both on-policy and off-policy benchmarks show that RC has a higher estimation accuracy than that of RLSTD and a significantly lower runtime complexity than that of LSTDC.
引用
收藏
页码:4251 / 4264
页数:14
相关论文
共 50 条
  • [41] A SYSTOLIC ARRAY FOR RECURSIVE LEAST-SQUARES COMPUTATIONS
    MOONEN, M
    VANDEWALLE, J
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (02) : 906 - 912
  • [42] RECURSIVE LEAST-SQUARES SMOOTHING OF NOISE IN IMAGES
    PANDA, DP
    KAK, AC
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1977, 25 (06): : 520 - 524
  • [43] Quaternion kernel recursive least-squares algorithm
    Wang, Gang
    Qiao, Jingci
    Xue, Rui
    Peng, Bei
    [J]. SIGNAL PROCESSING, 2021, 178
  • [44] RECURSIVE LEAST-SQUARES LADDER ESTIMATION ALGORITHMS
    LEE, DTL
    MORF, M
    FRIEDLANDER, B
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1981, 29 (03): : 627 - 641
  • [45] A Recursive Restricted Total Least-Squares Algorithm
    Rhode, Stephan
    Usevich, Konstantin
    Markovsky, Ivan
    Gauterin, Frank
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (21) : 5652 - 5662
  • [46] Sustainable l2-Regularized Actor-Critic based on Recursive Least-Squares Temporal Difference Learning
    Li, Luntong
    Li, Dazi
    Song, Tianheng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1886 - 1891
  • [47] THE GALERKIN GRADIENT LEAST-SQUARES METHOD
    FRANCA, LP
    DUTRADOCARMO, EG
    [J]. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 1989, 74 (01) : 41 - 54
  • [48] BIAS CORRECTION IN LEAST-SQUARES IDENTIFICATION
    STOICA, P
    SODERSTROM, T
    [J]. INTERNATIONAL JOURNAL OF CONTROL, 1982, 35 (03) : 449 - 457
  • [49] LEAST-SQUARES METHOD FOR OPTICAL CORRECTION
    ROSEN, S
    ELDERT, C
    [J]. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA, 1954, 44 (03) : 250 - 252
  • [50] A LEAST-SQUARES CORRECTION FOR SELECTIVITY BIAS
    OLSEN, RJ
    [J]. ECONOMETRICA, 1980, 48 (07) : 1815 - 1820