CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION

被引:0
|
作者
Cayci, Semih [1 ]
He, Niao [2 ]
Srikant, R. [3 ]
机构
[1] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany
[2] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
[3] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
基金
瑞士国家科学基金会;
关键词
reinforcement learning; policy gradient; nonconvex optimization;
D O I
10.1137/22M1540156
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite- time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of (O) over tilde (1/T) up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for sample-based NPG with entropy regularization.
引用
收藏
页码:2729 / 2755
页数:27
相关论文
共 50 条
  • [31] Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation
    Winnicki, Anna
    Srikant, R.
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 801 - 806
  • [32] Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games
    Sun, Youbang
    Liu, Tao
    Zhou, Ruida
    Kumar, P. R.
    Shahrampour, Shahin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] Effective Linear Policy Gradient Search through Primal-Dual Approximation
    Peng, Yiming
    Chen, Gang
    Zhang, Mengjie
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [34] Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning
    Zhou, Ruida
    Liu., Tao
    Kalathil, Dileep
    Kumar, P. R.
    Tian, Chao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [35] Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation
    Chen, Zaiwei
    Khodadadian, Sajad
    Maguluri, Siva Theja
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2611 - 2616
  • [36] Convergence of Batch Gradient Method Based on the Entropy Error Function for Feedforward Neural Networks
    Yan Xiong
    Xin Tong
    Neural Processing Letters, 2020, 52 : 2687 - 2695
  • [37] Convergence of Batch Gradient Method Based on the Entropy Error Function for Feedforward Neural Networks
    Xiong, Yan
    Tong, Xin
    NEURAL PROCESSING LETTERS, 2020, 52 (03) : 2687 - 2695
  • [38] Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence
    Ding, Dongsheng
    Wei, Chen-Yu
    Zhang, Kaiqing
    Jovanovic, Mihailo R.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [39] Rates of convergence of performance gradient estimates using function approximation and bias in reinforcement learning
    Grudic, GZ
    Ungar, LH
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1515 - 1522
  • [40] CESARO CONVERGENCE OF GRADIENT METHOD OF CONVEX-CONCAVE FUNCTION SADDLE POINT APPROXIMATION
    NEMIROVSKII, AS
    IUDIN, DB
    DOKLADY AKADEMII NAUK SSSR, 1978, 239 (05): : 1056 - 1059