Recursive Least-Squares Temporal Difference With Gradient Correction

被引:1
|
作者
Song, Tianheng [1 ]
Li, Dazi [1 ]
Yang, Weimin [2 ]
Hirasawa, Kotaro [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Dept Automat, Beijing 100029, Peoples R China
[2] Beijing Univ Chem Technol, Coll Mech & Elect Engn, Dept Mech Engn, Beijing 100029, Peoples R China
基金
北京市自然科学基金; 中国博士后科学基金; 中国国家自然科学基金;
关键词
Policy evaluation; reinforcement learning (RL); temporal differences (TDs); value function approximation; POLICY ITERATION; APPROXIMATION;
D O I
10.1109/TCYB.2019.2902342
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since the late 1980s, temporal difference (TD) learning has dominated the research area of policy evaluation algorithms. However, the demand for the avoidance of TD defects, such as low data-efficiency and divergence in off-policy learning, has inspired the studies of a large number of novel TD-based approaches. Gradient-based and least-squares-based algorithms comprise the major part of these new approaches. This paper aims to combine advantages of these two categories to derive an efficient policy evaluation algorithm with O(n(2)) per-time-step runtime complexity. The least-squares-based framework is adopted, and the gradient correction is used to improve convergence performance. This paper begins with the revision of a previous O(n(3)) batch algorithm, least-squares TD with a gradient correction (LS-TDC) to regularize the parameter vector. Based on the recursive least-squares technique, an O(n(2)) counterpart of LS-TDC called RC is proposed. To increase data efficiency, we generalize RC with eligibility traces. An off-policy extension is also proposed based on importance sampling. In addition, the convergence analysis for RC as well as LS-TDC is given. The empirical results in both on-policy and off-policy benchmarks show that RC has a higher estimation accuracy than that of RLSTD and a significantly lower runtime complexity than that of LSTDC.
引用
收藏
页码:4251 / 4264
页数:14
相关论文
共 50 条
  • [21] Least-squares temporal difference learning based on an extreme learning machine
    Escandell-Montero, Pablo
    Martinez-Martinez, Jose M.
    Martin-Guerrero, Jose D.
    Soria-Olivas, Emilio
    Gomez-Sanchis, Juan
    [J]. NEUROCOMPUTING, 2014, 141 : 37 - 45
  • [22] RECURSIVE ALGORITHM FOR PARTIAL LEAST-SQUARES REGRESSION
    HELLAND, K
    BERNTSEN, HE
    BORGEN, OS
    MARTENS, H
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1992, 14 (1-3) : 129 - 137
  • [23] Recursive Variational Inference for Total Least-Squares
    Friml, Dominik
    Vaclavek, Pavel
    [J]. IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 2839 - 2844
  • [24] Sparsity regularized recursive total least-squares
    Tanc, A. Korhan
    [J]. DIGITAL SIGNAL PROCESSING, 2015, 40 : 176 - 180
  • [25] SYSTOLIC ARRAY FOR RECURSIVE LEAST-SQUARES MINIMIZATION
    MCWHIRTER, JG
    [J]. ELECTRONICS LETTERS, 1983, 19 (18) : 729 - 730
  • [26] Exact initialization of the recursive least-squares algorithm
    Stoica, P
    Åhgren, P
    [J]. INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2002, 16 (03) : 219 - 230
  • [27] A recursive algorithm for nonlinear least-squares problems
    A. Alessandri
    M. Cuneo
    S. Pagnan
    M. Sanguineti
    [J]. Computational Optimization and Applications, 2007, 38 : 195 - 216
  • [28] RECURSIVE LEAST-SQUARES WITH STABILIZED INVERSE FACTORIZATION
    MOONEN, M
    VANDEWALLE, J
    [J]. SIGNAL PROCESSING, 1990, 21 (01) : 1 - 15
  • [29] Deep kernel recursive least-squares algorithm
    Hossein Mohamadipanah
    Mahdi Heydari
    Girish Chowdhary
    [J]. Nonlinear Dynamics, 2021, 104 : 2515 - 2530
  • [30] A Fast Robust Recursive Least-Squares Algorithm
    Rey Vega, Leonardo
    Rey, Hernan
    Benesty, Jacob
    Tressens, Sara
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (03) : 1209 - 1216