CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION

被引:0
|
作者
Cayci, Semih [1 ]
He, Niao [2 ]
Srikant, R. [3 ]
机构
[1] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany
[2] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
[3] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
基金
瑞士国家科学基金会;
关键词
reinforcement learning; policy gradient; nonconvex optimization;
D O I
10.1137/22M1540156
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite- time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of (O) over tilde (1/T) up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for sample-based NPG with entropy regularization.
引用
收藏
页码:2729 / 2755
页数:27
相关论文
共 50 条
  • [1] Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
    Ged, Francois G.
    Veiga, Maria Han
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [2] Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization
    Sun, Youbang
    Liu, Tao
    Kumar, P. R.
    Shahrampour, Shahin
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1217 - 1222
  • [3] Local Analysis of Entropy-Regularized Stochastic Soft-Max Policy Gradient Methods
    Ding, Yuhao
    Zhang, Junzi
    Lavaei, Javad
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [4] Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime
    Kerimkulov, Bekzhan
    Leahy, James-Michael
    Siska, David
    Szpruch, Lukasz
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] On the Linear Convergence of Natural Policy Gradient Algorithm
    Khodadadian, Sajad
    Jhunjhunwala, Prakirt Raj
    Varma, Sushil Mahavir
    Maguluri, Siva Theja
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3794 - 3799
  • [6] Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation
    Guan, Yue
    Zhang, Qifan
    Tsiotras, Panagiotis
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2462 - 2468
  • [7] Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference
    Lee, Jonathan N.
    Pacchiano, Aldo
    Jordan, Michael I.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 3003 - 3013
  • [8] Convergence rate of entropy-regularized multi-marginal optimal transport costs
    Nenna, Luca
    Pegon, Paul
    CANADIAN JOURNAL OF MATHEMATICS-JOURNAL CANADIEN DE MATHEMATIQUES, 2024,
  • [9] On linear and super-linear convergence of Natural Policy Gradient algorithm
    Khodadadian, Sajad
    Jhunjhunwala, Prakirt Raj
    Varma, Sushil Mahavir
    Maguluri, Siva Theja
    SYSTEMS & CONTROL LETTERS, 2022, 164
  • [10] Distributed entropy-regularized multi-agent reinforcement learning with policy consensus
    Hu, Yifan
    Fu, Junjie
    Wen, Guanghui
    Lv, Yuezu
    Ren, Wei
    AUTOMATICA, 2024, 164