On the Global Optimum Convergence of Momentum-based Policy Gradient

被引：0

作者：

Ding, Yuhao ^{[1
]}

Zhang, Junzi ^{[2
]}

Lavaei, Javad ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

[2] Amazon Advertising, San Francisco, CA USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151 | 2022年 / 151卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by establishing the first set of global convergence results of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fishernon-degenerate policy parametrizations, and show that adding a momentum term improves the global optimality sample complexities of vanilla PG methods by (O) over tilde(epsilon(-1.5)) and (O) over tilde(epsilon(-1)), respectively, where epsilon > 0 is the target tolerance. Our results for the generic Fishernon-degenerate policy parametrizations also provide the first single-loop and finite-batch PG algorithm achieving an (O) over tilde (epsilon(-3)) global optimality sample complexity. Finally, as a byproduct, our analyses provide general tools for deriving the global convergence rates of stochastic PG methods, which can be readily applied and extended to other PG estimators under the two parametrizations.

引用

页数：25

共 50 条

[11] Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Mei, Jincheng
Dai, Bo
Agarwal, Alekh
Ghavamzadeh, Mohammad
Szepesvari, Csaba
Schuurmans, Dale
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[12] Federated Gradient Averaging for Multi-Site Training with Momentum-Based Optimizers
Remedios, Samuel W.
Butman, John A.
Landman, Bennett A.
Pham, Dzung L.
DOMAIN ADAPTATION AND REPRESENTATION TRANSFER, AND DISTRIBUTED AND COLLABORATIVE LEARNING, DART 2020, DCL 2020, 2020, 12444 : 170 - 180
[13] Tradeoffs Between Convergence Rate and Noise Amplification for Momentum-Based Accelerated Optimization Algorithms
Mohammadi, Hesameddin
Razaviyayn, Meisam
Jovanovic, Mihailo R.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2025, 70 (02) : 889 - 904
[14] Accelerated Componentwise Gradient Boosting Using Efficient Data Representation and Momentum-Based Optimization
Schalk, Daniel
Bischl, Bernd
Ruegamer, David
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (02) : 631 - 641
[15] Nanoswimmer-oriented Direct Targeting Strategy Inspired by Momentum-based Gradient Optimization
Ali, Muhammad
Cree, Michael J.
Sharifi, Neda
Chen, Yifan
2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 741 - 744
[16] DREAMPlace 4.0: Timing-driven Global Placement with Momentum-based Net Weighting
Liao, Peiyu
Liu, Siting
Chen, Zhitang
Lv, Wenlong
Lin, Yibo
Yu, Bei
PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 939 - 944
[17] Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
Fazel, Maryam
Ge, Rong
Kakade, Sham M.
Mesbahi, Mehran
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[18] Noise amplifiation of momentum-based optimization algorithms
Mohammadi, Hesameddin
Razaviyayn, Meisam
Jovanovic, Mihailo R.
2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 849 - 854
[19] A momentum-based deformation system for granular material
Zeng, Ya-Lun
Tan, Charlie Irawan
Tai, Wen-Kai
Yang, Mau-Tsuen
Chiang, Cheng-Chin
Chang, Chin-Chen
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2007, 18 (4-5) : 289 - 300
[20] Superconductivity induced by fluctuations of momentum-based multipoles
Sumita, Shuntaro
Yanase, Youichi
PHYSICAL REVIEW RESEARCH, 2020, 2 (03):

← 1 2 3 4 5 →