On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

被引:0
|
作者
Zhang, Junyu [1 ]
Ni, Chengzhuo [2 ]
Yu, Zheng [2 ]
Szepesvari, Csaba [3 ]
Wang, Mengdi [2 ]
机构
[1] Natl Univ Singapore, Dept Ind Syst Engn & Management, Singapore 119077, Singapore
[2] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA
[3] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
关键词
ALGORITHMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the variance reduction techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this paper, a simple gradient truncation mechanism is proposed to address this issue. Moreover, we design a Truncated Stochastic Incremental Variance-Reduced Policy Gradient (TSIVR-PG) method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. We show an (O) over tilde (epsilon(-3)) sample complexity for TSIVR-PG to find an epsilon-stationary policy. By assuming the overparameterization of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global epsilon-optimal policy with (O) over tilde (epsilon(-2)) samples.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A variance-reduced electrothermal Monte Carlo method for semiconductor device simulation
    Muscato, Orazio
    Di Stefano, Vincenza
    Wagner, Wolfgang
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 65 (03) : 520 - 527
  • [32] Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies
    Li, Oscar
    Harrison, James
    Sohl-Dickstein, Jascha
    Smith, Virginia
    Metz, Luke
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] Lock-Free Parallelization for Variance-Reduced Stochastic Gradient Descent on Streaming Data
    Peng, Yaqiong
    Hao, Zhiyu
    Yun, Xiaochun
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (09) : 2220 - 2231
  • [34] Stochastic Variance-Reduced Cubic Regularization Methods
    Zhou, Dongruo
    Xu, Pan
    Gu, Quanquan
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
  • [35] Stochastic variance-reduced cubic regularization methods
    Zhou, Dongruo
    Xu, Pan
    Gu, Quanquan
    [J]. Journal of Machine Learning Research, 2019, 20
  • [36] Variance-reduced DSMC method for axial-symmetric flows of gaseous mixtures
    Szalmas, Lajos
    [J]. COMPUTERS & FLUIDS, 2013, 74 : 58 - 65
  • [37] A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization
    Xin, Ran
    Khan, Usman A.
    Kar, Soummya
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [38] Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold Optimization
    Zhang, Dewei
    Tajbakhsh, Sam Davanloo
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 196 (01) : 324 - 361
  • [39] Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold Optimization
    Dewei Zhang
    Sam Davanloo Tajbakhsh
    [J]. Journal of Optimization Theory and Applications, 2023, 196 : 324 - 361
  • [40] Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient
    Lin, Tianyi
    Fan, Chengyou
    Wang, Mengdi
    Jordan, Michael I.
    [J]. 2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 126 - 131