On Convergence Rate of MRetrace

被引:0
|
作者
Chen, Xingguo [1 ]
Qin, Wangrong [1 ]
Gong, Yu [1 ]
Yang, Shangdong [1 ]
Wang, Wenhao [2 ,3 ]
机构
[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
[2] Natl Univ Def Technol, Coll Elect Engn, Changsha 410073, Peoples R China
[3] Natl Univ Def Technol, Sci & Technol Informat Syst Engn Lab, Changsha 410073, Peoples R China
关键词
finite sample analysis; off-policy learning; minimum eigenvalues; MRetrace;
D O I
10.3390/math12182930
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Off-policy is a key setting for reinforcement learning algorithms. In recent years, the stability of off-policy learning for value-based reinforcement learning has been guaranteed even when combined with linear function approximation and bootstrapping. Convergence rate analysis is currently a hot topic. However, the convergence rates of learning algorithms vary, and analyzing the reasons behind this remains an open problem. In this paper, we propose an essentially simplified version of a convergence rate to generate general off-policy temporal difference learning algorithms. We emphasize that the primary determinant influencing convergence rate is the minimum eigenvalue of the key matrix. Furthermore, we conduct a comparative analysis of the influencing factor across various off-policy learning algorithms in diverse numerical scenarios. The experimental findings validate the proposed determinant, which serves as a benchmark for the design of more efficient learning algorithms.
引用
收藏
页数:19
相关论文
共 50 条