Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems

被引:0
|
作者
Yamaguchi, Akihiko [1 ]
Atkeson, Christopher G. [1 ]
机构
[1] Carnegie Mellon Univ, Inst Robot, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We explore a model-based approach to reinforcement learning where partially or totally unknown dynamics are learned and explicit planning is performed. We learn dynamics with neural networks, and plan behaviors with differential dynamic programming (DDP). In order to handle complicated dynamics, such as manipulating liquids (pouring), we consider temporally decomposed dynamics. We start from our recent work [1] where we used locally weighted regression (LWR) to model dynamics. The major contribution of this paper is making use of deep learning in the form of neural networks with stochastic DDP, and showing the advantages of neural networks over LWR. For this purpose, we extend neural networks for: (1) modeling prediction error and output noise, (2) computing an output probability distribution for a given input distribution, and (3) computing gradients of output expectation with respect to an input. Since neural networks have nonlinear activation functions, these extensions were not easy. We provide an analytic solution for these extensions using some simplifying assumptions. We verified this method in pouring simulation experiments. The learning performance with neural networks was better than that of LWR. The amount of spilled materials was reduced. We also present early results of robot experiments using a PR2. Accompanying video: https://youtu.be/aM3hE1J5W98
引用
收藏
页码:5434 / 5441
页数:8
相关论文
共 50 条
  • [1] Heuristic dynamic programming for neural networks learning - Part 2: I-order differential dynamic programming
    Krawczak, M
    [J]. NEURAL NETWORKS AND SOFT COMPUTING, 2003, : 224 - 229
  • [2] Reinforcement learning of dynamic behavior by using recurrent neural networks
    Ahmet Onat
    Hajime Kita
    Yoshikazu Nishikawa
    [J]. Artificial Life and Robotics, 1997, 1 (3) : 117 - 121
  • [3] Enhancing Supervisory Training Signals with Environmental Reinforcement Learning Using Adaptive Dynamic Programming and Artificial Neural Networks
    Melton, Niklas
    Wunsch, Donald C., II
    [J]. 2016 IEEE 15TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2016, : 331 - 335
  • [4] Combining Reinforcement Learning Algorithms with Graph Neural Networks to Solve Dynamic Job Shop Scheduling Problems
    Yang, Zhong
    Bi, Li
    Jiao, Xiaogang
    [J]. PROCESSES, 2023, 11 (05)
  • [5] Integrating recurrent neural networks and reinforcement learning for dynamic service composition
    Wang, Hongbing
    Li, Jiajie
    Yu, Qi
    Hong, Tianjing
    Yan, Jia
    Zhao, Wei
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 107 (107): : 551 - 563
  • [6] USING MODULAR NEURAL NETWORKS AND MACHINE LEARNING WITH REINFORCEMENT LEARNING TO SOLVE CLASSIFICATION PROBLEMS
    Leoshchenko, S. D.
    Oliinyk, A. O.
    Subbotin, S. A.
    Kolpakova, T. O.
    [J]. RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2024, (02) : 71 - 81
  • [7] CHORDNET:: Learning and producing voice leading with neural networks and dynamic programming
    Hörnel, D
    [J]. JOURNAL OF NEW MUSIC RESEARCH, 2004, 33 (04) : 387 - 397
  • [8] Dual representations for dynamic programming and reinforcement learning
    Wang, Tao
    Bowling, Michael
    Schuurmans, Dale
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 44 - +
  • [9] Heuristic dynamic programming for neural networks learning - Part 1: Learning as a control problem
    Krawczak, M
    [J]. NEURAL NETWORKS AND SOFT COMPUTING, 2003, : 218 - 223
  • [10] Reinforcement Learning with Neural Networks: A Survey
    Modi, Bhumika
    Jethva, H. B.
    [J]. PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS: VOL 1, 2016, 50 : 467 - 475