Deterministic Policy Gradient: Convergence Analysis

被引：0

作者：

Xiong, Huaqing ^{[1
]}

Xu, Tengyu ^{[1
]}

Zhao, Lin ^{[2
]}

Liang, Yingbin ^{[1
]}

Zhang, Wei ^{[3
]}

机构：

[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH USA

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[3] Southern Univ Sci & Technol SUSTech, Dept Mech & Energy Engn, Shenzhen, Guangdong, Peoples R China

来源：

UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180 | 2022年 / 180卷

基金：

中国国家自然科学基金; 美国国家科学基金会;

关键词：

MARKOV; APPROXIMATION; ALGORITHMS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The deterministic policy gradient (DPG) method proposed in Silver et al. [2014] has been demonstrated to exhibit superior performance particularly for applications with multi-dimensional and continuous action spaces. However, it remains unclear whether DPG converges, and if so, how fast it converges and whether it converges as efficiently as other PG methods. In this paper, we provide a theoretical analysis of DPG to answer those questions. We study the single timescale DPG (often the case in practice) in both on-policy and off-policy settings, and show that both algorithms attain an epsilon-accurate stationary policy up to a system error with a sample complexity of O(epsilon(-2)). Moreover, we establish the convergence rate for DPG under Gaussian noise exploration, which is widely adopted in practice to improve the performance of DPG. To our best knowledge, this is the first non-asymptotic convergence characterization for DPG methods.

引用

页码：2159 / 2169

页数：11

共 50 条

[1] Deterministic Policy Gradient Algorithms
Silver, David
Lever, Guy
Heess, Nicolas
Degris, Thomas
Wierstra, Daan
Riedmiller, Martin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[2] Proximal Deterministic Policy Gradient
Maggipinto, Marco
Susto, Gian Antonio
Chaudhari, Pratik
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5438 - 5444
[3] Deterministic Convergence of an Online Gradient Method with Momentum
Zhang, Naimin
INTELLIGENT COMPUTING, PART I: INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, ICIC 2006, PART I, 2006, 4113 : 94 - 105
[4] Feature selection in deterministic policy gradient
Li, Luntong
Li, Dazi
Song, Tianheng
JOURNAL OF ENGINEERING-JOE, 2020, 2020 (13): : 403 - 406
[5] Policy Space Noise in Deep Deterministic Policy Gradient
Yan, Yan
Liu, Quan
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 624 - 634
[6] A note on the convergence of deterministic gradient sampling in nonsmooth optimization
Gebken, Bennet
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2024, 88 (01) : 151 - 165
[7] A note on the convergence of deterministic gradient sampling in nonsmooth optimization
Bennet Gebken
Computational Optimization and Applications, 2024, 88 : 151 - 165
[8] Deterministic convergence of an online gradient method for neural networks
Wu, W
Xu, YS
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2002, 144 (1-2) : 335 - 347
[9] Mutual Deep Deterministic Policy Gradient Learning
Sun, Zhou
2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 508 - 513
[10] Deep Deterministic Policy Gradient for Portfolio Management
Khemlichi, Firdaous
Chougrad, Hiba
Khamlichi, Youness Idrissi
El Boushaki, Abdessamad
Ben Ali, Safae Elhaj
2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 424 - 429

← 1 2 3 4 5 →