Deterministic Policy Gradient: Convergence Analysis

被引:0
|
作者
Xiong, Huaqing [1 ]
Xu, Tengyu [1 ]
Zhao, Lin [2 ]
Liang, Yingbin [1 ]
Zhang, Wei [3 ]
机构
[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH USA
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[3] Southern Univ Sci & Technol SUSTech, Dept Mech & Energy Engn, Shenzhen, Guangdong, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
MARKOV; APPROXIMATION; ALGORITHMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The deterministic policy gradient (DPG) method proposed in Silver et al. [2014] has been demonstrated to exhibit superior performance particularly for applications with multi-dimensional and continuous action spaces. However, it remains unclear whether DPG converges, and if so, how fast it converges and whether it converges as efficiently as other PG methods. In this paper, we provide a theoretical analysis of DPG to answer those questions. We study the single timescale DPG (often the case in practice) in both on-policy and off-policy settings, and show that both algorithms attain an epsilon-accurate stationary policy up to a system error with a sample complexity of O(epsilon(-2)). Moreover, we establish the convergence rate for DPG under Gaussian noise exploration, which is widely adopted in practice to improve the performance of DPG. To our best knowledge, this is the first non-asymptotic convergence characterization for DPG methods.
引用
收藏
页码:2159 / 2169
页数:11
相关论文
共 50 条
  • [1] Deterministic Policy Gradient Algorithms
    Silver, David
    Lever, Guy
    Heess, Nicolas
    Degris, Thomas
    Wierstra, Daan
    Riedmiller, Martin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [2] Proximal Deterministic Policy Gradient
    Maggipinto, Marco
    Susto, Gian Antonio
    Chaudhari, Pratik
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5438 - 5444
  • [3] Deterministic Convergence of an Online Gradient Method with Momentum
    Zhang, Naimin
    INTELLIGENT COMPUTING, PART I: INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, ICIC 2006, PART I, 2006, 4113 : 94 - 105
  • [4] Feature selection in deterministic policy gradient
    Li, Luntong
    Li, Dazi
    Song, Tianheng
    JOURNAL OF ENGINEERING-JOE, 2020, 2020 (13): : 403 - 406
  • [5] Policy Space Noise in Deep Deterministic Policy Gradient
    Yan, Yan
    Liu, Quan
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 624 - 634
  • [7] A note on the convergence of deterministic gradient sampling in nonsmooth optimization
    Bennet Gebken
    Computational Optimization and Applications, 2024, 88 : 151 - 165
  • [8] Deterministic convergence of an online gradient method for neural networks
    Wu, W
    Xu, YS
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2002, 144 (1-2) : 335 - 347
  • [9] Mutual Deep Deterministic Policy Gradient Learning
    Sun, Zhou
    2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 508 - 513
  • [10] Deep Deterministic Policy Gradient for Portfolio Management
    Khemlichi, Firdaous
    Chougrad, Hiba
    Khamlichi, Youness Idrissi
    El Boushaki, Abdessamad
    Ben Ali, Safae Elhaj
    2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 424 - 429