Policy Optimization with Stochastic Mirror Descent

被引:0
|
作者
Yang, Long [1 ]
Zhang, Yu [2 ]
Zheng, Gang [1 ]
Zheng, Qian [1 ,3 ]
Li, Pengfei [1 ]
Huang, Jianghang [1 ]
Pan, Gang [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
[2] Netease Games AI Lab, Hangzhou, Zhejiang, Peoples R China
[3] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore
关键词
GAME; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic mirror descent. In VRMPO, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed VRMPO needs only O(epsilon(-3)) sample trajectories to achieve an epsilon-approximate first-order stationary point, which matches the best sample complexity for policy optimization. Extensive empirical results demonstrate that VRMPO outperforms the state-of-the-art policy gradient methods in various settings.
引用
收藏
页码:8823 / 8831
页数:9
相关论文
共 50 条
  • [1] Adaptive Stochastic Mirror Descent for Constrained Optimization
    Bayandina, Anastasia
    [J]. 2017 CONSTRUCTIVE NONSMOOTH ANALYSIS AND RELATED TOPICS (DEDICATED TO THE MEMORY OF V.F. DEMYANOV) (CNSA), 2017, : 40 - 43
  • [2] STOCHASTIC BLOCK MIRROR DESCENT METHODS FOR NONSMOOTH AND STOCHASTIC OPTIMIZATION
    Dang, Cong D.
    Lan, Guanghui
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2015, 25 (02) : 856 - 881
  • [3] Stochastic Mirror Descent in Variationally Coherent Optimization Problems
    Zhou, Zhengyuan
    Mertikopoulos, Panayotis
    Bambos, Nicholas
    Boyd, Stephen
    Glynn, Peter
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [4] Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method
    Nazin, A., V
    Nemirovsky, A. S.
    Tsybakov, A. B.
    Juditsky, A. B.
    [J]. AUTOMATION AND REMOTE CONTROL, 2019, 80 (09) : 1607 - 1627
  • [5] Optimal distributed stochastic mirror descent for strongly convex optimization
    Yuan, Deming
    Hong, Yiguang
    Ho, Daniel W. C.
    Jiang, Guoping
    [J]. AUTOMATICA, 2018, 90 : 196 - 203
  • [6] Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method
    A. V. Nazin
    A. S. Nemirovsky
    A. B. Tsybakov
    A. B. Juditsky
    [J]. Automation and Remote Control, 2019, 80 : 1607 - 1627
  • [7] Multi-Agent Mirror Descent for Decentralized Stochastic Optimization
    Rabbat, Michael
    [J]. 2015 IEEE 6TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP), 2015, : 517 - 520
  • [8] Algorithms of Inertial Mirror Descent in Convex Problems of Stochastic Optimization
    Nazin, A. V.
    [J]. AUTOMATION AND REMOTE CONTROL, 2018, 79 (01) : 78 - 88
  • [9] Algorithms of Inertial Mirror Descent in Convex Problems of Stochastic Optimization
    A. V. Nazin
    [J]. Automation and Remote Control, 2018, 79 : 78 - 88
  • [10] Stochastic mirror descent method for distributed multi-agent optimization
    Li, Jueyou
    Li, Guoquan
    Wu, Zhiyou
    Wu, Changzhi
    [J]. OPTIMIZATION LETTERS, 2018, 12 (06) : 1179 - 1197