Variance-Reduced Conservative Policy Iteration

被引：0

作者：

Agarwal, Naman ^{[1
]}

Bullins, Brian ^{[2
]}

Singh, Karan ^{[3
]}

机构：

[1] Google AI Princeton, Princeton, NJ 08544 USA

[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA

[3] Carnegie Mellon Univ, Tepper Sch Business, Pittsburgh, PA 15213 USA

来源：

INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201 | 2023年 / 201卷

关键词：

variance reduction; reinforcement learning; non-convex optimization;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a e-functional local optimum from O(epsilon(-4)) to O(epsilon(-3)). Under state-coverage and policy-completeness assumptions, the algorithm enjoys epsilon-global optimality after sampling O(epsilon(-2)) times, improving upon the previously established O(epsilon(-3)) sample requirement.

引用

页码：3 / 33

页数：31

共 50 条

[1] Stochastic Variance-Reduced Policy Gradient
Papini, Matteo
Binaghi, Damiano
Canonaco, Giuseppe
Pirotta, Matteo
Restelli, Marcello
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[2] On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
Zhang, Junyu
Ni, Chengzhuo
Yu, Zheng
Szepesvari, Csaba
Wang, Mengdi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[3] An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
Xu, Pan
Gao, Felicia
Gu, Quanquan
[J]. 35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 541 - 551
[4] Variance-reduced sampling importance resampling
Xiao, Yao
Fu, Kang
Li, Kun
[J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
[5] Variance-Reduced Methods for Machine Learning
Gower, Robert M.
Schmidt, Mark
Bach, Francis
Richtarik, Peter
[J]. PROCEEDINGS OF THE IEEE, 2020, 108 (11) : 1968 - 1983
[6] Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds
Paczolay, Gabor
Papini, Matteo
Metelli, Alberto Maria
Harmati, Istvan
Restelli, Marcello
[J]. MACHINE LEARNING, 2024, 113 (09) : 6475 - 6510
[7] Accelerating variance-reduced stochastic gradient methods
Derek Driggs
Matthias J. Ehrhardt
Carola-Bibiane Schönlieb
[J]. Mathematical Programming, 2022, 191 : 671 - 715
[8] Accelerating variance-reduced stochastic gradient methods
Driggs, Derek
Ehrhardt, Matthias J.
Schonlieb, Carola-Bibiane
[J]. MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 671 - 715
[9] Stochastic Variance-Reduced Cubic Regularization Methods
Zhou, Dongruo
Xu, Pan
Gu, Quanquan
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
[10] Stochastic variance-reduced cubic regularization methods
Zhou, Dongruo
Xu, Pan
Gu, Quanquan
[J]. Journal of Machine Learning Research, 2019, 20

← 1 2 3 4 5 →