Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems

被引：0

作者：

Yamaguchi, Akihiko ^{[1
]}

Atkeson, Christopher G. ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Inst Robot, 5000 Forbes Ave, Pittsburgh, PA 15213 USA

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2016年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We explore a model-based approach to reinforcement learning where partially or totally unknown dynamics are learned and explicit planning is performed. We learn dynamics with neural networks, and plan behaviors with differential dynamic programming (DDP). In order to handle complicated dynamics, such as manipulating liquids (pouring), we consider temporally decomposed dynamics. We start from our recent work [1] where we used locally weighted regression (LWR) to model dynamics. The major contribution of this paper is making use of deep learning in the form of neural networks with stochastic DDP, and showing the advantages of neural networks over LWR. For this purpose, we extend neural networks for: (1) modeling prediction error and output noise, (2) computing an output probability distribution for a given input distribution, and (3) computing gradients of output expectation with respect to an input. Since neural networks have nonlinear activation functions, these extensions were not easy. We provide an analytic solution for these extensions using some simplifying assumptions. We verified this method in pouring simulation experiments. The learning performance with neural networks was better than that of LWR. The amount of spilled materials was reduced. We also present early results of robot experiments using a PR2. Accompanying video: https://youtu.be/aM3hE1J5W98

引用

页码：5434 / 5441

页数：8

共 50 条

[1] Heuristic dynamic programming for neural networks learning - Part 2: I-order differential dynamic programming
Krawczak, M
[J]. NEURAL NETWORKS AND SOFT COMPUTING, 2003, : 224 - 229
[2] Reinforcement learning of dynamic behavior by using recurrent neural networks
Ahmet Onat
Hajime Kita
Yoshikazu Nishikawa
[J]. Artificial Life and Robotics, 1997, 1 (3) : 117 - 121
[3] Enhancing Supervisory Training Signals with Environmental Reinforcement Learning Using Adaptive Dynamic Programming and Artificial Neural Networks
Melton, Niklas
Wunsch, Donald C., II
[J]. 2016 IEEE 15TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2016, : 331 - 335
[4] Combining Reinforcement Learning Algorithms with Graph Neural Networks to Solve Dynamic Job Shop Scheduling Problems
Yang, Zhong
Bi, Li
Jiao, Xiaogang
[J]. PROCESSES, 2023, 11 (05)
[5] Integrating recurrent neural networks and reinforcement learning for dynamic service composition
Wang, Hongbing
Li, Jiajie
Yu, Qi
Hong, Tianjing
Yan, Jia
Zhao, Wei
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 107 (107): : 551 - 563
[6] USING MODULAR NEURAL NETWORKS AND MACHINE LEARNING WITH REINFORCEMENT LEARNING TO SOLVE CLASSIFICATION PROBLEMS
Leoshchenko, S. D.
Oliinyk, A. O.
Subbotin, S. A.
Kolpakova, T. O.
[J]. RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2024, (02) : 71 - 81
[7] CHORDNET:: Learning and producing voice leading with neural networks and dynamic programming
Hörnel, D
[J]. JOURNAL OF NEW MUSIC RESEARCH, 2004, 33 (04) : 387 - 397
[8] Dual representations for dynamic programming and reinforcement learning
Wang, Tao
Bowling, Michael
Schuurmans, Dale
[J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 44 - +
[9] Heuristic dynamic programming for neural networks learning - Part 1: Learning as a control problem
Krawczak, M
[J]. NEURAL NETWORKS AND SOFT COMPUTING, 2003, : 218 - 223
[10] Reinforcement Learning with Neural Networks: A Survey
Modi, Bhumika
Jethva, H. B.
[J]. PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS: VOL 1, 2016, 50 : 467 - 475

← 1 2 3 4 5 →