Actor-critic algorithm with incremental dual natural policy gradient

被引：0

作者：

Zhang P. ^{[1
]}

Liu Q. ^{[1
,2
,3
]}

Zhong S. ^{[1
]}

Zhai J.-W. ^{[1
]}

Qian W.-S. ^{[1
]}

机构：

[1] School of Computer Science and Technology, Soochow University, Suzhou

[2] Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing

[3] Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun

来源：

| 2017年 / Editorial Board of Journal on Communications卷 / 38期

基金：

中国国家自然科学基金;

关键词：

Actor-critic; Continuous space; Natural gradient; Reinforcement learning;

D O I：

10.11959/j.issn.1000-436x.2017089

中图分类号：

学科分类号：

摘要：

The existed algorithms for continuous action space failed to consider the way of selecting optimal action and utilizing the knowledge of the action space, so an efficient actor-critic algorithm was proposed by improving the natural gradient. The objective of the proposed algorithm was to maximize the expected return. Upper and the lower bounds of the action range were weighted to obtain the optimal action. The two bounds were approximated by linear function. Afterward, the problem of obtaining the optimal action was transferred to the learning of double policy parameter vectors. To speed the learning, the incremental Fisher information matrix and the eligibilities of both bounds were designed. At three reinforcement learning problems, compared with other representative methods with continuous action space, the simulation results show that the proposed algorithm has the advantages of rapid convergence rate and high convergence stability. © 2017, Editorial Board of Journal on Communications. All right reserved.

引用

页码：166 / 177

页数：11

共 21 条

[1] Sutton R.S., Barto A.G., Reinforcement Learning: An Introduction, (1998)
[2] Busoniu L., Babuska R., Schutter B.D., Et al., Reinforcement Learning and Dynamic Programming Using Function Approximators, (2010)
[3] Lee D., Seo H., Jung M.W., Neural basis of reinforcement learning and decision making, Annual Review of Neuroscience, 35, 5, pp. 287-308, (2012)
[4] Wiering M., Van O.M., Reinforcement learning: STATE-OF-THE-Art, (2014)
[5] Sutton R.S., McAllester D.A., Singh S.P., Et al., Policy gradient methods for reinforcement learning with function approximation, NIPS, 99, pp. 1057-1063, (1999)
[6] Peters J., Schaal S., Natural A.C., Neurocomputing, 71, 7-9, pp. 1180-1190, (2008)
[7] Peters J., Vijayakumar S., Schaal S., Reinforcement learning for humanoid robotics, Autonomous Robot, 12, 1, pp. 1-20, (2003)
[8] Van H.H., Reinforcement learning in continuous state and action spaces, Reinforcement Learning, pp. 207-251, (2012)
[9] Wierstra D., Schaul T., Peters J., Et al., Natural evolution strategies, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp. 3381-3387, (2008)
[10] Sun Y., Wierstra D., Schaul T., Et al., Efficient natural evolution strategies, The 11th Annual Conference on Genetic and Evolutionary Computation, pp. 539-546, (2009)

← 1 2 3 →