CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION

被引:0
|
作者
Cayci, Semih [1 ]
He, Niao [2 ]
Srikant, R. [3 ]
机构
[1] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany
[2] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
[3] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
基金
瑞士国家科学基金会;
关键词
reinforcement learning; policy gradient; nonconvex optimization;
D O I
10.1137/22M1540156
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite- time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of (O) over tilde (1/T) up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for sample-based NPG with entropy regularization.
引用
收藏
页码:2729 / 2755
页数:27
相关论文
共 50 条
  • [21] POLICY MIRROR DESCENT FOR REGULARIZED REINFORCEMENT LEARNING: A GENERALIZED FRAMEWORK WITH LINEAR CONVERGENCE
    Zhan, Wenhao
    Cen, Shicong
    Huang, Baihe
    Chen, Yuxin
    Lee, Jason D.
    Chi, Yuejie
    SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (02) : 1061 - 1091
  • [22] Cross-Entropy Regularized Policy Gradient for Multirobot Nonadversarial Moving Target Search
    Guo, Hongliang
    Liu, Zhaokai
    Shi, Rui
    Yau, Wei-Yun
    Rus, Daniela
    IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (04) : 2569 - 2584
  • [23] On the Convergence of Temporal-Difference Learning with Linear Function Approximation
    Vladislav Tadić
    Machine Learning, 2001, 42 : 241 - 267
  • [24] On the convergence of temporal-difference learning with linear function approximation
    Tadic, V
    MACHINE LEARNING, 2001, 42 (03) : 241 - 267
  • [25] Policy Gradient With Value Function Approximation For Collective Multiagent Planning
    Duc Thien Nguyen
    Kumar, Akshat
    Lau, Hoong Chuin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [26] A policy gradient reinforcement learning algorithm with fuzzy function approximation
    Gu, DB
    Yang, EF
    IEEE ROBIO 2004: Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2004, : 936 - 940
  • [27] Convergence and Sample Complexity of Policy Gradient Methods for Stabilizing Linear Systems
    Zhao, Feiran
    Fu, Xingyun
    You, Keyou
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2025, 70 (03) : 1455 - 1466
  • [28] Improving Gaussian Process Value Function Approximation in Policy Gradient Algorithms
    Jakab, Hunor
    Csato, Lehel
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT II, 2011, 6792 : 221 - +
  • [29] Least squares policy evaluation algorithms with linear function approximation
    Nedic, A
    Bertsekas, DP
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2003, 13 (1-2): : 79 - 110
  • [30] Least Squares Policy Evaluation Algorithms with Linear Function Approximation
    A. NediĆ
    D. P. Bertsekas
    Discrete Event Dynamic Systems, 2003, 13 : 79 - 110