Beyond backpropagate through time: Efficient model-based training through time-splitting

被引:0
|
作者
Gao, Jiaxin [1 ]
Guan, Yang [2 ]
Li, Wenyu [3 ]
Li, Shengbo Eben [2 ]
Ma, Fei [1 ]
Zheng, Jianfeng [4 ]
Wei, Junqing [4 ]
Zhang, Bo [4 ]
Li, Keqiang [2 ]
机构
[1] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing, Peoples R China
[2] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China
[3] Nankai Univ, Coll Artificial Intelligence, Tianjin, Peoples R China
[4] DiDi Chuxing, Urban Transportat Div, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
model-based policy gradient; optimal control; parallel training; reinforcement learning; time-splitting; LEVEL; GAME; GO;
D O I
10.1002/int.22928
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based policy gradient (MBPG) has been employed to seek an approximate solution to the optimal control problem. However, there is coupling between adjacent states due to temporal dependencies, making the training time grow linearly with the time horizon. This paper reshapes the training process of MBPG with the time-splitting technique to establish a time-independent algorithm called Training Through Time-Splitting (T3S). First, copy the coupled variables to obtain two independent variables. Meanwhile, an extra variable together with an equivalence constraint is introduced for problem consistency. Then, the transformed problem divides into subproblems with carefully derived loss functions. Subproblems own decoupled variables and shared policy networks, which means they can be optimized concurrently. Guided by the algorithm design, this paper further proposes an asynchronous parallel training scheme to accelerate training efficiency. Numerical simulation shows that the T3S algorithm outperforms the MBPG algorithm by 83.6% in wall-clock time with a trajectory tracking task.
引用
收藏
页码:8046 / 8067
页数:22
相关论文
共 50 条
  • [31] Automation of Model-Based Testing through Model Transformations
    Aydal, Emine G.
    Woodcock, Jim
    2009 TESTING: ACADEMIC AND INDUSTRIAL CONFERENCE-PRACTICE AND RESEARCH TECHNIQUES, TAIC PART 2009, 2009, : 63 - 71
  • [32] Fuzzy Model-based Design of a Transparent Controller for a Time Delayed Bilateral Teleoperation System Through State Convergence
    Farooq, Umar
    Gu, Jason
    El-Hawary, Mohamed E.
    Balas, Valentina E.
    Asad, Muhammad Usman
    Abbas, Ghulam
    ACTA POLYTECHNICA HUNGARICA, 2017, 14 (08) : 7 - 26
  • [33] Memory-Efficient Backpropagation Through Time
    Gruslys, Audrunas
    Munos, Remi
    Danihelka, Ivo
    Lanctot, Marc
    Graves, Alex
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [34] Fostering ecoliteracy through model-based instruction
    Long, Tammy M.
    Dauer, Joseph T.
    Kostelnik, Kristen M.
    Momsen, Jennifer L.
    Wyse, Sara A.
    Speth, Elena Bray
    Ebert-May, Diane
    FRONTIERS IN ECOLOGY AND THE ENVIRONMENT, 2014, 12 (02) : 138 - 139
  • [35] Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent
    Moreno, Bianca Marin
    Bregere, Margaux
    Gaillard, Pierre
    Oudjane, Nadia
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [36] More efficient clinical trials through use of scientific model-based statistical tests
    Jonsson, EN
    Sheiner, LB
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2002, 72 (06) : 603 - 614
  • [37] Linear Stability Analysis of Runge-Kutta-Based Partial Time-Splitting Schemes for the Euler Equations
    Baldauf, Michael
    MONTHLY WEATHER REVIEW, 2010, 138 (12) : 4475 - 4496
  • [38] Model-Based Simulation Systems for Adaptive Training in Time-Critical Decision Making
    Abhyankar, Kushal
    Polakonda, Raghavendra
    Ganapathy, Subhashini
    Barrerra, Kristen
    IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE (NAECON 2014), 2014, : 149 - 152
  • [39] On the Influence of Time-Correlation in Initial Training Data for Model-Based Policy Search
    Hanna, Elias
    Doncieux, Stephane
    2023 21ST INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS, ICAR, 2023, : 361 - 366
  • [40] Model-Based Testing for General Stochastic Time
    Gerhold, Marcus
    Hartmanns, Arnd
    Stoelinga, Marielle
    NASA FORMAL METHODS, NFM 2018, 2018, 10811 : 203 - 219