Policy Optimization with Second-Order Advantage Information

被引:0
|
作者
Li, Jiajin [1 ]
Wang, Baoxiang [1 ]
Zhang, Shengyu [1 ,2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Tencent, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators. We present the action subspace dependent gradient (ASDG) estimator which incorporates the Rao-Blackwell theorem (RB) and Control Variates (CV) into a unified framework to reduce the variance. To invoke RB, our proposed algorithm (POSA) learns the underlying factorization structure among the action space based on the second-order advantage information. POSA captures the quadratic information explicitly and efficiently by utilizing the wide & deep architecture. Empirical studies show that our proposed approach demonstrates the performance improvements on high-dimensional synthetic settings and OpenAI Gym's MuJoCo continuous control tasks.
引用
收藏
页码:5038 / 5044
页数:7
相关论文
共 50 条
  • [11] On second-order conditions in unconstrained optimization
    Bednarik, Dusan
    Pastor, Karel
    [J]. MATHEMATICAL PROGRAMMING, 2008, 113 (02) : 283 - 298
  • [12] Frechet approach in second-order optimization
    Bednarik, Dusan
    Pastor, Karel
    [J]. APPLIED MATHEMATICS LETTERS, 2009, 22 (06) : 960 - 967
  • [13] Dynamic optimization of chemical and biochemical processes using restricted second-order information
    Balsa-Canto, E
    Banga, JR
    Alonso, AA
    Vassiliadis, VS
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2001, 25 (4-6) : 539 - 546
  • [14] Second-Order Optimization of Mutual Information for Real-Time Image Registration
    Dame, Amaury
    Marchand, Eric
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2012, 21 (09) : 4190 - 4203
  • [15] Second-order consensus in multi-agent systems based on second-order neighbours' information
    Pan, Huan
    Nian, Xiaohong
    Guo, Ling
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2014, 45 (05) : 902 - 914
  • [16] A simple method for direct modeling of second-order liquid chromatographic data with retention time shifts and holding the second-order advantage
    Wang, Tong
    Wu, Hai-Long
    Yu, Yong-Jie
    Long, Wan-Jun
    Cheng, Li
    Chen, An-Qi
    Yu, Ru-Qin
    [J]. JOURNAL OF CHROMATOGRAPHY A, 2019, 1605
  • [17] Second-order facial information processing in schizophrenia
    Baudouin, Jean-Yves
    Vernet, Mathilde
    Franck, Nicolas
    [J]. NEUROPSYCHOLOGY, 2008, 22 (03) : 313 - 320
  • [18] On second-order optimality of the observed Fisher information
    Lindsay, BG
    Li, B
    [J]. ANNALS OF STATISTICS, 1997, 25 (05): : 2172 - 2199
  • [19] SINE: Second-Order Information Network Embedding
    Wang, Ziqi
    Zhang, Yuanyuan
    Wang, Shudong
    Shang, Junliang
    [J]. IEEE ACCESS, 2020, 8 : 139044 - 139051
  • [20] The spectral bundle method with second-order information
    Helmberg, C.
    Overton, M. L.
    Rendl, F.
    [J]. OPTIMIZATION METHODS & SOFTWARE, 2014, 29 (04): : 855 - 876