Deterministic Policy Gradient: Convergence Analysis

被引:0
|
作者
Xiong, Huaqing [1 ]
Xu, Tengyu [1 ]
Zhao, Lin [2 ]
Liang, Yingbin [1 ]
Zhang, Wei [3 ]
机构
[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH USA
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[3] Southern Univ Sci & Technol SUSTech, Dept Mech & Energy Engn, Shenzhen, Guangdong, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
MARKOV; APPROXIMATION; ALGORITHMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The deterministic policy gradient (DPG) method proposed in Silver et al. [2014] has been demonstrated to exhibit superior performance particularly for applications with multi-dimensional and continuous action spaces. However, it remains unclear whether DPG converges, and if so, how fast it converges and whether it converges as efficiently as other PG methods. In this paper, we provide a theoretical analysis of DPG to answer those questions. We study the single timescale DPG (often the case in practice) in both on-policy and off-policy settings, and show that both algorithms attain an epsilon-accurate stationary policy up to a system error with a sample complexity of O(epsilon(-2)). Moreover, we establish the convergence rate for DPG under Gaussian noise exploration, which is widely adopted in practice to improve the performance of DPG. To our best knowledge, this is the first non-asymptotic convergence characterization for DPG methods.
引用
收藏
页码:2159 / 2169
页数:11
相关论文
共 50 条
  • [31] Stability Analysis for Autonomous Vehicle Navigation Trained over Deep Deterministic Policy Gradient
    Cabezas-Olivenza, Mireya
    Zulueta, Ekaitz
    Sanchez-Chica, Ander
    Fernandez-Gamiz, Unai
    Teso-Fz-Betono, Adrian
    MATHEMATICS, 2023, 11 (01)
  • [32] Geometry and convergence of natural policy gradient methods
    Müller J.
    Montúfar G.
    Information Geometry, 2024, 7 (Suppl 1) : 485 - 523
  • [33] On the Linear Convergence of Natural Policy Gradient Algorithm
    Khodadadian, Sajad
    Jhunjhunwala, Prakirt Raj
    Varma, Sushil Mahavir
    Maguluri, Siva Theja
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3794 - 3799
  • [34] Deterministic Gradient-Descent Learning of Linear Regressions: Adaptive Algorithms, Convergence Analysis and Noise Compensation
    Liu, Kang-Zhi
    Gan, Chao
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7867 - 7877
  • [35] Guided deterministic policy optimization with gradient-free policy parameters information
    Shen, Chun
    Zhu, Sheng
    Han, Shuai
    Gong, Xiaoyu
    Lu, Shuai
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [36] State Representation Learning for Minimax Deep Deterministic Policy Gradient
    Hu, Dapeng
    Jiang, Xuesong
    Wei, Xiumei
    Wang, Jian
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT I, 2019, 11775 : 481 - 487
  • [37] Deep Deterministic Policy Gradient With Prioritized Sampling for Power Control
    Zhou, Shiyang
    Cheng, Yufan
    Lei, Xia
    Duan, Huanhuan
    IEEE ACCESS, 2020, 8 : 194240 - 194250
  • [38] Controlling Bicycle Using Deep Deterministic Policy Gradient Algorithm
    Le Pham Tuyen
    Chung, TaeChoong
    2017 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2017, : 413 - 417
  • [39] Semicentralized Deep Deterministic Policy Gradient in Cooperative StarCraft Games
    Xie, Dong
    Zhong, Xiangnan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1584 - 1593
  • [40] Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms
    Zhang, Haifei
    Xu, Jian
    Zhang, Jian
    Liu, Quan
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022