Policy Optimization with Stochastic Mirror Descent

被引：0

作者：

Yang, Long ^{[1
]}

Zhang, Yu ^{[2
]}

Zheng, Gang ^{[1
]}

Zheng, Qian ^{[1
,3
]}

Li, Pengfei ^{[1
]}

Huang, Jianghang ^{[1
]}

Pan, Gang ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China

[2] Netease Games AI Lab, Hangzhou, Zhejiang, Peoples R China

[3] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

关键词：

GAME; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic mirror descent. In VRMPO, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed VRMPO needs only O(epsilon(-3)) sample trajectories to achieve an epsilon-approximate first-order stationary point, which matches the best sample complexity for policy optimization. Extensive empirical results demonstrate that VRMPO outperforms the state-of-the-art policy gradient methods in various settings.

引用

页码：8823 / 8831

页数：9

共 50 条

[1] Adaptive Stochastic Mirror Descent for Constrained Optimization
Bayandina, Anastasia
[J]. 2017 CONSTRUCTIVE NONSMOOTH ANALYSIS AND RELATED TOPICS (DEDICATED TO THE MEMORY OF V.F. DEMYANOV) (CNSA), 2017, : 40 - 43
[2] STOCHASTIC BLOCK MIRROR DESCENT METHODS FOR NONSMOOTH AND STOCHASTIC OPTIMIZATION
Dang, Cong D.
Lan, Guanghui
[J]. SIAM JOURNAL ON OPTIMIZATION, 2015, 25 (02) : 856 - 881
[3] Stochastic Mirror Descent in Variationally Coherent Optimization Problems
Zhou, Zhengyuan
Mertikopoulos, Panayotis
Bambos, Nicholas
Boyd, Stephen
Glynn, Peter
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[4] Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method
Nazin, A., V
Nemirovsky, A. S.
Tsybakov, A. B.
Juditsky, A. B.
[J]. AUTOMATION AND REMOTE CONTROL, 2019, 80 (09) : 1607 - 1627
[5] Optimal distributed stochastic mirror descent for strongly convex optimization
Yuan, Deming
Hong, Yiguang
Ho, Daniel W. C.
Jiang, Guoping
[J]. AUTOMATICA, 2018, 90 : 196 - 203
[6] Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method
A. V. Nazin
A. S. Nemirovsky
A. B. Tsybakov
A. B. Juditsky
[J]. Automation and Remote Control, 2019, 80 : 1607 - 1627
[7] Multi-Agent Mirror Descent for Decentralized Stochastic Optimization
Rabbat, Michael
[J]. 2015 IEEE 6TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP), 2015, : 517 - 520
[8] Algorithms of Inertial Mirror Descent in Convex Problems of Stochastic Optimization
Nazin, A. V.
[J]. AUTOMATION AND REMOTE CONTROL, 2018, 79 (01) : 78 - 88
[9] Algorithms of Inertial Mirror Descent in Convex Problems of Stochastic Optimization
A. V. Nazin
[J]. Automation and Remote Control, 2018, 79 : 78 - 88
[10] Stochastic mirror descent method for distributed multi-agent optimization
Li, Jueyou
Li, Guoquan
Wu, Zhiyou
Wu, Changzhi
[J]. OPTIMIZATION LETTERS, 2018, 12 (06) : 1179 - 1197

← 1 2 3 4 5 →