Variance-Reduced Conservative Policy Iteration

被引:0
|
作者
Agarwal, Naman [1 ]
Bullins, Brian [2 ]
Singh, Karan [3 ]
机构
[1] Google AI Princeton, Princeton, NJ 08544 USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Carnegie Mellon Univ, Tepper Sch Business, Pittsburgh, PA 15213 USA
关键词
variance reduction; reinforcement learning; non-convex optimization;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a e-functional local optimum from O(epsilon(-4)) to O(epsilon(-3)). Under state-coverage and policy-completeness assumptions, the algorithm enjoys epsilon-global optimality after sampling O(epsilon(-2)) times, improving upon the previously established O(epsilon(-3)) sample requirement.
引用
收藏
页码:3 / 33
页数:31
相关论文
共 50 条
  • [1] Stochastic Variance-Reduced Policy Gradient
    Papini, Matteo
    Binaghi, Damiano
    Canonaco, Giuseppe
    Pirotta, Matteo
    Restelli, Marcello
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [2] On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
    Zhang, Junyu
    Ni, Chengzhuo
    Yu, Zheng
    Szepesvari, Csaba
    Wang, Mengdi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
    Xu, Pan
    Gao, Felicia
    Gu, Quanquan
    [J]. 35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 541 - 551
  • [4] Variance-reduced sampling importance resampling
    Xiao, Yao
    Fu, Kang
    Li, Kun
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [5] Variance-Reduced Methods for Machine Learning
    Gower, Robert M.
    Schmidt, Mark
    Bach, Francis
    Richtarik, Peter
    [J]. PROCEEDINGS OF THE IEEE, 2020, 108 (11) : 1968 - 1983
  • [6] Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds
    Paczolay, Gabor
    Papini, Matteo
    Metelli, Alberto Maria
    Harmati, Istvan
    Restelli, Marcello
    [J]. MACHINE LEARNING, 2024, 113 (09) : 6475 - 6510
  • [7] Accelerating variance-reduced stochastic gradient methods
    Derek Driggs
    Matthias J. Ehrhardt
    Carola-Bibiane Schönlieb
    [J]. Mathematical Programming, 2022, 191 : 671 - 715
  • [8] Accelerating variance-reduced stochastic gradient methods
    Driggs, Derek
    Ehrhardt, Matthias J.
    Schonlieb, Carola-Bibiane
    [J]. MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 671 - 715
  • [9] Stochastic Variance-Reduced Cubic Regularization Methods
    Zhou, Dongruo
    Xu, Pan
    Gu, Quanquan
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
  • [10] Stochastic variance-reduced cubic regularization methods
    Zhou, Dongruo
    Xu, Pan
    Gu, Quanquan
    [J]. Journal of Machine Learning Research, 2019, 20