On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

被引:0
|
作者
Zhang, Junyu [1 ]
Ni, Chengzhuo [2 ]
Yu, Zheng [2 ]
Szepesvari, Csaba [3 ]
Wang, Mengdi [2 ]
机构
[1] Natl Univ Singapore, Dept Ind Syst Engn & Management, Singapore 119077, Singapore
[2] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA
[3] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
关键词
ALGORITHMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the variance reduction techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this paper, a simple gradient truncation mechanism is proposed to address this issue. Moreover, we design a Truncated Stochastic Incremental Variance-Reduced Policy Gradient (TSIVR-PG) method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. We show an (O) over tilde (epsilon(-3)) sample complexity for TSIVR-PG to find an epsilon-stationary policy. By assuming the overparameterization of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global epsilon-optimal policy with (O) over tilde (epsilon(-2)) samples.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
    Xu, Pan
    Gao, Felicia
    Gu, Quanquan
    [J]. 35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 541 - 551
  • [2] Stochastic Variance-Reduced Policy Gradient
    Papini, Matteo
    Binaghi, Damiano
    Canonaco, Giuseppe
    Pirotta, Matteo
    Restelli, Marcello
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [3] Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds
    Paczolay, Gabor
    Papini, Matteo
    Metelli, Alberto Maria
    Harmati, Istvan
    Restelli, Marcello
    [J]. MACHINE LEARNING, 2024, 113 (09) : 6475 - 6510
  • [4] PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation
    Gargiani, Matilde
    Zanelli, Andrea
    Martinelli, Andrea
    Summers, Tyler H.
    Lygeros, John
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] A unified variance-reduced accelerated gradient method for convex optimization
    Lan, Guanghui
    Li, Zhize
    Zhou, Yi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Variance-Reduced Conservative Policy Iteration
    Agarwal, Naman
    Bullins, Brian
    Singh, Karan
    [J]. INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 3 - 33
  • [7] Accelerating variance-reduced stochastic gradient methods
    Derek Driggs
    Matthias J. Ehrhardt
    Carola-Bibiane Schönlieb
    [J]. Mathematical Programming, 2022, 191 : 671 - 715
  • [8] Accelerating variance-reduced stochastic gradient methods
    Driggs, Derek
    Ehrhardt, Matthias J.
    Schonlieb, Carola-Bibiane
    [J]. MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 671 - 715
  • [9] CONVERGENCE OF VARIANCE-REDUCED LEARNING UNDER RANDOM RESHUFFLING
    Ying, Bicheng
    Yuan, Kun
    Sayed, Ali H.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2286 - 2290
  • [10] Variance-Reduced Decentralized Stochastic Optimization With Accelerated Convergence
    Xin, Ran
    Khan, Usman A.
    Kar, Soummya
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 6255 - 6271