More on training strategies for critic and action neural networks in dual heuristic programming method

被引:0
|
作者
Lendaris, GG
Paintz, C
Shannon, T
机构
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper for the special session on Adaptive Critic Design Methods at the SMC '97 Conference describes a modification to the (to date) usual procedures reported for training the Critic and Action neural networks in the Dual Heuristic Programming (DHP) method [7]-[12]. This modification entails updating both the Critic and the Action networks each computational cycle, rather than only one at a time. The distinction lies in the introduction of a (real) second copy of the Critic network whose weights are adjusted less often (once per ''epoch'', where the epoch is defined to comprise Some number N>I computational cycles), and the ''desired value'' for training the other Critic is obtained from this Critic-Copy. In a previous publication [4], the proposed modified training strategy was demonstrated on the well-known pole-cart controller problem. In that paper, the full. 6 dimensional state vector was input to the Critic and Action NNs, however, the utility function only involved pole angle, not distance along the track (x). For the first set of results presented here, the 3 states associated with the x variable were eliminated from the inputs to the NNs, keeping the same utility function previously defined. This resulted in improved learning and controller performance. From this point, the method is applied to two additional problems, each of increasing complexity: for the first, an x-related term is added to the utility function for the pole-cart problem, and simultaneously, the x-related states were added back in to the NNs (i.e., increase number of state variables used from 3 to 6); the second relates to steering a vehicle with independent drive motors on each wheel. The problem contexts and experimental results are provided.
引用
收藏
页码:3067 / 3072
页数:6
相关论文
共 50 条
  • [21] Local Critic Training for Model-Parallel Learning of Deep Neural Networks
    Lee, Hojung
    Hsieh, Cho-Jui
    Lee, Jong-Seok
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 4424 - 4436
  • [22] Convergence of action dependent dual heuristic dynamic programming algorithms in LQ control tasks
    Krokavec, D
    INTELLIGENT TECHNOLOGIES - THEORY AND APPLICATIONS: NEW TRENDS IN INTELLIGENT TECHNOLOGIES, 2002, 76 : 72 - 80
  • [23] USING DYNAMIC PROGRAMMING AND NEURAL NETWORKS TO MATCH HUMAN ACTION
    Vajda, Tamas
    Bako, Laszlo
    Brassai, Sandor Tihamer
    PROCEEDINGS OF 11TH INTERNATIONAL CARPATHIAN CONTROL CONFERENCE, 2010, 2010, : 231 - 234
  • [24] Training multilayer feedforward neural networks using dynamic programming
    Sun, M
    PROCEEDINGS OF THE TWENTY-EIGHTH SOUTHEASTERN SYMPOSIUM ON SYSTEM THEORY, 1996, : 163 - 167
  • [25] Training Strategies for Convolutional Neural Networks with Transformed Input
    Khandani, Masoumeh Kalantari
    Mikhael, Wasfy B.
    2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 1058 - 1061
  • [26] CONVOLUTIONAL NEURAL NETWORKS AND TRAINING STRATEGIES FOR SKIN DETECTION
    Kim, Yoonsik
    Hwang, Insung
    Cho, Nam Ik
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3919 - 3923
  • [27] Facial Action Units for Training Convolutional Neural Networks
    Trinh Thi Doan Pham
    Won, Chee Sun
    IEEE ACCESS, 2019, 7 : 77816 - 77824
  • [28] Heuristic dynamic programming for neural networks learning - Part 1: Learning as a control problem
    Krawczak, M
    NEURAL NETWORKS AND SOFT COMPUTING, 2003, : 218 - 223
  • [29] A Dual-Dimer method for training physics-constrained neural networks with minimax architecture
    Liu, Dehao
    Wang, Yan
    NEURAL NETWORKS, 2021, 136 : 112 - 125
  • [30] Heuristic dynamic programming for neural networks learning - Part 2: I-order differential dynamic programming
    Krawczak, M
    NEURAL NETWORKS AND SOFT COMPUTING, 2003, : 224 - 229