CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION

被引：0

作者：

Cayci, Semih ^{[1
]}

He, Niao ^{[2
]}

Srikant, R. ^{[3
]}

机构：

[1] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany

[2] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland

[3] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA

来源：

SIAM JOURNAL ON OPTIMIZATION | 2024年 / 34卷 / 03期

基金：

瑞士国家科学基金会;

关键词：

reinforcement learning; policy gradient; nonconvex optimization;

D O I：

10.1137/22M1540156

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite- time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of (O) over tilde (1/T) up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for sample-based NPG with entropy regularization.

引用

页码：2729 / 2755

页数：27

共 50 条

[31] Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation
Winnicki, Anna
Srikant, R.
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 801 - 806
[32] Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games
Sun, Youbang
Liu, Tao
Zhou, Ruida
Kumar, P. R.
Shahrampour, Shahin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[33] Effective Linear Policy Gradient Search through Primal-Dual Approximation
Peng, Yiming
Chen, Gang
Zhang, Mengjie
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[34] Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning
Zhou, Ruida
Liu., Tao
Kalathil, Dileep
Kumar, P. R.
Tian, Chao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[35] Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation
Chen, Zaiwei
Khodadadian, Sajad
Maguluri, Siva Theja
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2611 - 2616
[36] Convergence of Batch Gradient Method Based on the Entropy Error Function for Feedforward Neural Networks
Yan Xiong
Xin Tong
Neural Processing Letters, 2020, 52 : 2687 - 2695
[37] Convergence of Batch Gradient Method Based on the Entropy Error Function for Feedforward Neural Networks
Xiong, Yan
Tong, Xin
NEURAL PROCESSING LETTERS, 2020, 52 (03) : 2687 - 2695
[38] Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence
Ding, Dongsheng
Wei, Chen-Yu
Zhang, Kaiqing
Jovanovic, Mihailo R.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[39] Rates of convergence of performance gradient estimates using function approximation and bias in reinforcement learning
Grudic, GZ
Ungar, LH
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1515 - 1522
[40] CESARO CONVERGENCE OF GRADIENT METHOD OF CONVEX-CONCAVE FUNCTION SADDLE POINT APPROXIMATION
NEMIROVSKII, AS
IUDIN, DB
DOKLADY AKADEMII NAUK SSSR, 1978, 239 (05): : 1056 - 1059

← 1 2 3 4 5 →