Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime

被引:0
|
作者
Kerimkulov, Bekzhan [1 ]
Leahy, James-Michael [2 ]
Siska, David [1 ,3 ]
Szpruch, Lukasz [1 ,4 ]
机构
[1] Univ Edinburgh, Sch Math, Edinburgh, Midlothian, Scotland
[2] Imperial Coll London, Dept Math, London, England
[3] Vega Protocol, Gibraltar, Gibraltar
[4] Alan Turing Inst, London, England
基金
英国工程与自然科学研究理事会;
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, and entropy-regularized Markov decision processes (MDPs). We consider a softmax policy with (one-hidden layer) neural network approximation in a mean-field regime. Additional entropic regularization in the associated mean-field probability measure is added, and the corresponding gradient flow is studied in the 2-Wasserstein metric. We show that the objective function is increasing along the gradient flow. Further, we prove that if the regularization in terms of the mean-field measure is sufficient, the gradient flow converges exponentially fast to the unique stationary solution, which is the unique maximizer of the regularized MDP objective. Lastly, we study the sensitivity of the value function along the gradient flow with respect to regularization parameters and the initial condition. Our results rely on the careful analysis of the non-linear Fokker-Planck-Kolmogorov equation and extend the pioneering work of (Mei et al., 2020) and (Agarwal et al., 2020), which quantify the global convergence rate of policy gradient for entropy-regularized MDPs in the tabular setting.
引用
收藏
页数:31
相关论文
共 44 条
  • [1] Mean-field approximation with neural network
    Strausz, G
    [J]. INES'97 : 1997 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT ENGINEERING SYSTEMS, PROCEEDINGS, 1997, : 245 - 249
  • [2] CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION
    Cayci, Semih
    He, Niao
    Srikant, R.
    [J]. SIAM Journal on Optimization, 2024, 34 (03) : 2729 - 2755
  • [3] MEAN-FIELD APPROXIMATION MINIMIZES RELATIVE ENTROPY
    BILBRO, GL
    SNYDER, WE
    MANN, RC
    [J]. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1991, 8 (02): : 290 - 294
  • [5] MEAN-FIELD THEORY OF A NEURAL NETWORK
    COOPER, LN
    SCOFIELD, CL
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (06) : 1973 - 1977
  • [6] Proximal Mean-field for Neural Network Quantization
    Ajanthan, Thalaiyasingam
    Dokania, Puneet K.
    Hartley, Richard
    Torr, Philip H. S.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4870 - 4879
  • [7] Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time
    Wang, Weichen
    Han, Jiequn
    Yang, Zhuoran
    Wang, Zhaoran
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7780 - 7791
  • [8] The Random Phase Approximation for Interacting Fermi Gases in the Mean-Field Regime
    Christiansen, Martin Ravn
    Hainzl, Christian
    Nam, Phan Thanh
    [J]. FORUM OF MATHEMATICS PI, 2023, 11
  • [9] MEAN-FIELD APPROXIMATION FOR THE DESCRIPTION OF NETWORK STRUCTURE - LIMITS OF VALIDITY
    POLIANCZYK, EV
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1992, 203 : 68 - PMSE
  • [10] Mean-field dynamics of a random neural network with noise
    Klinshov, Vladimir
    Franovic, Igor
    [J]. PHYSICAL REVIEW E, 2015, 92 (06)