CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION

被引:0
|
作者
Cayci, Semih [1 ]
He, Niao [2 ]
Srikant, R. [3 ]
机构
[1] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany
[2] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
[3] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
基金
瑞士国家科学基金会;
关键词
reinforcement learning; policy gradient; nonconvex optimization;
D O I
10.1137/22M1540156
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite- time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of (O) over tilde (1/T) up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for sample-based NPG with entropy regularization.
引用
收藏
页码:2729 / 2755
页数:27
相关论文
共 50 条
  • [41] Evaluation of the current function in linear sweep voltammetry by Pade approximation and epsilon convergence
    S. Sivakumar
    C. A. Basha
    Russian Journal of Electrochemistry, 2005, 41 : 421 - 438
  • [42] Evaluation of the current function in linear sweep voltammetry by Pade approximation and epsilon convergence
    Sivakumar, S
    Basha, CA
    RUSSIAN JOURNAL OF ELECTROCHEMISTRY, 2005, 41 (04) : 421 - 438
  • [43] Convergence of policy gradient for stochastic linear quadratic optimal control problems in infinite horizon
    Zhang, Xinpei
    Jia, Guangyan
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2025, 547 (01)
  • [44] Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient Method and Global Convergence
    Jansch-Porto, Joao Paulo
    Hu, Bin
    Dullerud, Geir E.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (04) : 2475 - 2482
  • [45] Accelerated and Instance-Optimal Policy Evaluation with Linear Function Approximation
    Li, Tianjiao
    Lan, Guanghui
    Pananjady, Ashwin
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2023, 5 (01): : 174 - 200
  • [46] Byzantine-Resilient Decentralized Policy Evaluation With Linear Function Approximation
    Wu, Zhaoxian
    Shen, Han
    Chen, Tianyi
    Ling, Qing
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 3839 - 3853
  • [47] Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence
    Pattathil, Sarath
    Zhang, Kaiqing
    Ozdaglar, Asuman
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [48] A novel off policy Q(λ) algorithm based on linear function approximation
    Fu, Qi-Ming
    Liu, Quan
    Wang, Hui
    Xiao, Fei
    Yu, Jun
    Li, Jiao
    Jisuanji Xuebao/Chinese Journal of Computers, 2014, 37 (03): : 677 - 686
  • [49] Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction
    Feng, Jie
    Wei, Ke
    Chen, Jinchi
    JOURNAL OF SCIENTIFIC COMPUTING, 2024, 101 (02)
  • [50] Byzantine-Resilient Decentralized Policy Evaluation with Linear Function Approximation
    Wu, Zhaoxian
    Shen, Han
    Chen, Tianyi
    Ling, Qing
    Ling, Qing (lingqing556@mail.sysu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc. (69): : 3839 - 3853