Deterministic Policy Gradient: Convergence Analysis

被引:0
|
作者
Xiong, Huaqing [1 ]
Xu, Tengyu [1 ]
Zhao, Lin [2 ]
Liang, Yingbin [1 ]
Zhang, Wei [3 ]
机构
[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH USA
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[3] Southern Univ Sci & Technol SUSTech, Dept Mech & Energy Engn, Shenzhen, Guangdong, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
MARKOV; APPROXIMATION; ALGORITHMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The deterministic policy gradient (DPG) method proposed in Silver et al. [2014] has been demonstrated to exhibit superior performance particularly for applications with multi-dimensional and continuous action spaces. However, it remains unclear whether DPG converges, and if so, how fast it converges and whether it converges as efficiently as other PG methods. In this paper, we provide a theoretical analysis of DPG to answer those questions. We study the single timescale DPG (often the case in practice) in both on-policy and off-policy settings, and show that both algorithms attain an epsilon-accurate stationary policy up to a system error with a sample complexity of O(epsilon(-2)). Moreover, we establish the convergence rate for DPG under Gaussian noise exploration, which is widely adopted in practice to improve the performance of DPG. To our best knowledge, this is the first non-asymptotic convergence characterization for DPG methods.
引用
收藏
页码:2159 / 2169
页数:11
相关论文
共 50 条
  • [41] NETWORK ARCHITECTURE REASONING VIA DEEP DETERMINISTIC POLICY GRADIENT
    Liu, Huidong
    Du, Fang
    Tang, Xiaofen
    Liu, Hao
    Yu, Zhenhua
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [42] A Method of Attitude Control Based on Deep Deterministic Policy Gradient
    Zhang, Jian
    Wu, Fengge
    Zhao, Junsuo
    Xu, Fanjiang
    COGNITIVE SYSTEMS AND SIGNAL PROCESSING, PT II, 2019, 1006 : 197 - 207
  • [43] Multilayer Deep Deterministic Policy Gradient for Static Safety and Stability Analysis of Novel Power Systems
    Long, Yun
    Lu, Youfei
    Zhao, Hongwei
    Wu, Renbo
    Bao, Tao
    Liu, Jun
    INTERNATIONAL TRANSACTIONS ON ELECTRICAL ENERGY SYSTEMS, 2023, 2023
  • [44] Dynamical Motor Control Learned with Deep Deterministic Policy Gradient
    Shi, Haibo
    Sun, Yaoru
    Li, Jie
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2018, 2018
  • [45] BYZANTINE-ROBUST FEDERATED DEEP DETERMINISTIC POLICY GRADIENT
    Lin, Qifeng
    Ling, Qing
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4013 - 4017
  • [46] Target tracking strategy using deep deterministic policy gradient
    You, Shixun
    Diao, Ming
    Gao, Lipeng
    Zhang, Fulong
    Wang, Huan
    APPLIED SOFT COMPUTING, 2020, 95
  • [47] Duplicated Replay Buffer for Asynchronous Deep Deterministic Policy Gradient
    Motehayeri, Seyed Mohammad Seyed
    Baghi, Vahid
    Miandoab, Ehsan Maani
    Moeini, Ali
    2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC), 2021,
  • [48] Deterministic Policy Gradient With Integral Compensator for Robust Quadrotor Control
    Wang, Yuanda
    Sun, Jia
    He, Haibo
    Sun, Changyin
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (10): : 3713 - 3725
  • [49] Optimal Trade Execution Based on Deep Deterministic Policy Gradient
    Ye, Zekun
    Deng, Weijie
    Zhou, Shuigeng
    Xu, Yi
    Guan, Jihong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 638 - 654
  • [50] Compensation Control of UAV Based on Deep Deterministic Policy Gradient
    Xu, Zijun
    Qi, Juntong
    Wang, Mingming
    Wu, Chong
    Yang, Guang
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 2289 - 2296