Deterministic Policy Gradient: Convergence Analysis

被引：0

作者：

Xiong, Huaqing ^{[1
]}

Xu, Tengyu ^{[1
]}

Zhao, Lin ^{[2
]}

Liang, Yingbin ^{[1
]}

Zhang, Wei ^{[3
]}

机构：

[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH USA

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[3] Southern Univ Sci & Technol SUSTech, Dept Mech & Energy Engn, Shenzhen, Guangdong, Peoples R China

来源：

UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180 | 2022年 / 180卷

基金：

中国国家自然科学基金; 美国国家科学基金会;

关键词：

MARKOV; APPROXIMATION; ALGORITHMS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The deterministic policy gradient (DPG) method proposed in Silver et al. [2014] has been demonstrated to exhibit superior performance particularly for applications with multi-dimensional and continuous action spaces. However, it remains unclear whether DPG converges, and if so, how fast it converges and whether it converges as efficiently as other PG methods. In this paper, we provide a theoretical analysis of DPG to answer those questions. We study the single timescale DPG (often the case in practice) in both on-policy and off-policy settings, and show that both algorithms attain an epsilon-accurate stationary policy up to a system error with a sample complexity of O(epsilon(-2)). Moreover, we establish the convergence rate for DPG under Gaussian noise exploration, which is widely adopted in practice to improve the performance of DPG. To our best knowledge, this is the first non-asymptotic convergence characterization for DPG methods.

引用

页码：2159 / 2169

页数：11

共 50 条

[41] NETWORK ARCHITECTURE REASONING VIA DEEP DETERMINISTIC POLICY GRADIENT
Liu, Huidong
Du, Fang
Tang, Xiaofen
Liu, Hao
Yu, Zhenhua
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[42] A Method of Attitude Control Based on Deep Deterministic Policy Gradient
Zhang, Jian
Wu, Fengge
Zhao, Junsuo
Xu, Fanjiang
COGNITIVE SYSTEMS AND SIGNAL PROCESSING, PT II, 2019, 1006 : 197 - 207
[43] Multilayer Deep Deterministic Policy Gradient for Static Safety and Stability Analysis of Novel Power Systems
Long, Yun
Lu, Youfei
Zhao, Hongwei
Wu, Renbo
Bao, Tao
Liu, Jun
INTERNATIONAL TRANSACTIONS ON ELECTRICAL ENERGY SYSTEMS, 2023, 2023
[44] Dynamical Motor Control Learned with Deep Deterministic Policy Gradient
Shi, Haibo
Sun, Yaoru
Li, Jie
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2018, 2018
[45] BYZANTINE-ROBUST FEDERATED DEEP DETERMINISTIC POLICY GRADIENT
Lin, Qifeng
Ling, Qing
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4013 - 4017
[46] Target tracking strategy using deep deterministic policy gradient
You, Shixun
Diao, Ming
Gao, Lipeng
Zhang, Fulong
Wang, Huan
APPLIED SOFT COMPUTING, 2020, 95
[47] Duplicated Replay Buffer for Asynchronous Deep Deterministic Policy Gradient
Motehayeri, Seyed Mohammad Seyed
Baghi, Vahid
Miandoab, Ehsan Maani
Moeini, Ali
2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC), 2021,
[48] Deterministic Policy Gradient With Integral Compensator for Robust Quadrotor Control
Wang, Yuanda
Sun, Jia
He, Haibo
Sun, Changyin
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (10): : 3713 - 3725
[49] Optimal Trade Execution Based on Deep Deterministic Policy Gradient
Ye, Zekun
Deng, Weijie
Zhou, Shuigeng
Xu, Yi
Guan, Jihong
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 638 - 654
[50] Compensation Control of UAV Based on Deep Deterministic Policy Gradient
Xu, Zijun
Qi, Juntong
Wang, Mingming
Wu, Chong
Yang, Guang
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 2289 - 2296

← 1 2 3 4 5 →