Deterministic Policy Gradient: Convergence Analysis

被引：0

作者：

Xiong, Huaqing ^{[1
]}

Xu, Tengyu ^{[1
]}

Zhao, Lin ^{[2
]}

Liang, Yingbin ^{[1
]}

Zhang, Wei ^{[3
]}

机构：

[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH USA

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[3] Southern Univ Sci & Technol SUSTech, Dept Mech & Energy Engn, Shenzhen, Guangdong, Peoples R China

来源：

UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180 | 2022年 / 180卷

基金：

中国国家自然科学基金; 美国国家科学基金会;

关键词：

MARKOV; APPROXIMATION; ALGORITHMS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The deterministic policy gradient (DPG) method proposed in Silver et al. [2014] has been demonstrated to exhibit superior performance particularly for applications with multi-dimensional and continuous action spaces. However, it remains unclear whether DPG converges, and if so, how fast it converges and whether it converges as efficiently as other PG methods. In this paper, we provide a theoretical analysis of DPG to answer those questions. We study the single timescale DPG (often the case in practice) in both on-policy and off-policy settings, and show that both algorithms attain an epsilon-accurate stationary policy up to a system error with a sample complexity of O(epsilon(-2)). Moreover, we establish the convergence rate for DPG under Gaussian noise exploration, which is widely adopted in practice to improve the performance of DPG. To our best knowledge, this is the first non-asymptotic convergence characterization for DPG methods.

引用

页码：2159 / 2169

页数：11

共 50 条

[31] Stability Analysis for Autonomous Vehicle Navigation Trained over Deep Deterministic Policy Gradient
Cabezas-Olivenza, Mireya
Zulueta, Ekaitz
Sanchez-Chica, Ander
Fernandez-Gamiz, Unai
Teso-Fz-Betono, Adrian
MATHEMATICS, 2023, 11 (01)
[32] Geometry and convergence of natural policy gradient methods
Müller J.
Montúfar G.
Information Geometry, 2024, 7 (Suppl 1) : 485 - 523
[33] On the Linear Convergence of Natural Policy Gradient Algorithm
Khodadadian, Sajad
Jhunjhunwala, Prakirt Raj
Varma, Sushil Mahavir
Maguluri, Siva Theja
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3794 - 3799
[34] Deterministic Gradient-Descent Learning of Linear Regressions: Adaptive Algorithms, Convergence Analysis and Noise Compensation
Liu, Kang-Zhi
Gan, Chao
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7867 - 7877
[35] Guided deterministic policy optimization with gradient-free policy parameters information
Shen, Chun
Zhu, Sheng
Han, Shuai
Gong, Xiaoyu
Lu, Shuai
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
[36] State Representation Learning for Minimax Deep Deterministic Policy Gradient
Hu, Dapeng
Jiang, Xuesong
Wei, Xiumei
Wang, Jian
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT I, 2019, 11775 : 481 - 487
[37] Deep Deterministic Policy Gradient With Prioritized Sampling for Power Control
Zhou, Shiyang
Cheng, Yufan
Lei, Xia
Duan, Huanhuan
IEEE ACCESS, 2020, 8 : 194240 - 194250
[38] Controlling Bicycle Using Deep Deterministic Policy Gradient Algorithm
Le Pham Tuyen
Chung, TaeChoong
2017 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2017, : 413 - 417
[39] Semicentralized Deep Deterministic Policy Gradient in Cooperative StarCraft Games
Xie, Dong
Zhong, Xiangnan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1584 - 1593
[40] Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms
Zhang, Haifei
Xu, Jian
Zhang, Jian
Liu, Quan
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022

← 1 2 3 4 5 →