Policy Optimization with Second-Order Advantage Information

被引:0
|
作者
Li, Jiajin [1 ]
Wang, Baoxiang [1 ]
Zhang, Shengyu [1 ,2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Tencent, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators. We present the action subspace dependent gradient (ASDG) estimator which incorporates the Rao-Blackwell theorem (RB) and Control Variates (CV) into a unified framework to reduce the variance. To invoke RB, our proposed algorithm (POSA) learns the underlying factorization structure among the action space based on the second-order advantage information. POSA captures the quadratic information explicitly and efficiently by utilizing the wide & deep architecture. Empirical studies show that our proposed approach demonstrates the performance improvements on high-dimensional synthetic settings and OpenAI Gym's MuJoCo continuous control tasks.
引用
收藏
页码:5038 / 5044
页数:7
相关论文
共 50 条
  • [1] Achieving the analytical second-order advantage with non-bilinear second-order data
    Chiappini, Fabricio A.
    Gutierrez, Fabiana
    Goicoechea, Hector C.
    Olivieri, Alejandro C.
    [J]. ANALYTICA CHIMICA ACTA, 2021, 1181
  • [2] Robust Optimization Utilizing the Second-Order Design Sensitivity Information
    Kim, Nam-Kyung
    Kim, Dong-Hun
    Kim, Dong-Wook
    Kim, Heung-Geun
    Lowther, David A.
    Sykulski, Jan K.
    [J]. IEEE TRANSACTIONS ON MAGNETICS, 2010, 46 (08) : 3117 - 3120
  • [3] Experimental study of non-linear second-order analytical data with focus on the second-order advantage
    Culzoni, Maria J.
    Damiani, Patricia C.
    Garcia-Reiriz, Alejandro
    Goicoechea, Hector C.
    Olivieri, Alejandro C.
    [J]. ANALYST, 2007, 132 (07) : 654 - 663
  • [4] Second-Order Optimality Conditions for Multiobjective Optimization Whose Order Induced by Second-Order Cone
    Zhang, Li-Wei
    Zhang, Ji-Hong
    Zhang, Yu-Le
    [J]. JOURNAL OF THE OPERATIONS RESEARCH SOCIETY OF CHINA, 2018, 6 (02) : 267 - 288
  • [5] Second-order asymptotics of mutual information
    Prelov, VV
    Verdú, S
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (08) : 1567 - 1580
  • [6] Second-order information in data assimilation
    Le Dimet, FX
    Navon, IM
    Daescu, DN
    [J]. MONTHLY WEATHER REVIEW, 2002, 130 (03) : 629 - 648
  • [7] On second-order conditions in unconstrained optimization
    Dušan Bednařík
    Karel Pastor
    [J]. Mathematical Programming, 2008, 113 : 283 - 298
  • [8] Second-order mollified derivatives and optimization
    Giovanni P. Crespi
    Davide La Torre
    Matteo Rocca
    [J]. Rendiconti del Circolo Matematico di Palermo, 2003, 52 (2) : 251 - 262
  • [9] Second-order cone optimization of the gradostat *
    Taylor, Josh A.
    Rapaport, Alain
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2021, 151
  • [10] Second-Order Online Nonconvex Optimization
    Lesage-Landry, Antoine
    Taylor, Joshua A.
    Shames, Iman
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (10) : 4866 - 4872