Beyond backpropagate through time: Efficient model-based training through time-splitting

被引:0
|
作者
Gao, Jiaxin [1 ]
Guan, Yang [2 ]
Li, Wenyu [3 ]
Li, Shengbo Eben [2 ]
Ma, Fei [1 ]
Zheng, Jianfeng [4 ]
Wei, Junqing [4 ]
Zhang, Bo [4 ]
Li, Keqiang [2 ]
机构
[1] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing, Peoples R China
[2] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China
[3] Nankai Univ, Coll Artificial Intelligence, Tianjin, Peoples R China
[4] DiDi Chuxing, Urban Transportat Div, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
model-based policy gradient; optimal control; parallel training; reinforcement learning; time-splitting; LEVEL; GAME; GO;
D O I
10.1002/int.22928
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based policy gradient (MBPG) has been employed to seek an approximate solution to the optimal control problem. However, there is coupling between adjacent states due to temporal dependencies, making the training time grow linearly with the time horizon. This paper reshapes the training process of MBPG with the time-splitting technique to establish a time-independent algorithm called Training Through Time-Splitting (T3S). First, copy the coupled variables to obtain two independent variables. Meanwhile, an extra variable together with an equivalence constraint is introduced for problem consistency. Then, the transformed problem divides into subproblems with carefully derived loss functions. Subproblems own decoupled variables and shared policy networks, which means they can be optimized concurrently. Guided by the algorithm design, this paper further proposes an asynchronous parallel training scheme to accelerate training efficiency. Numerical simulation shows that the T3S algorithm outperforms the MBPG algorithm by 83.6% in wall-clock time with a trajectory tracking task.
引用
收藏
页码:8046 / 8067
页数:22
相关论文
共 50 条
  • [1] AN EFFICIENT TIME-SPLITTING METHOD FOR THE EHRENFEST DYNAMICS
    Fang, Di
    Jin, Shi
    Sparber, Christof
    MULTISCALE MODELING & SIMULATION, 2018, 16 (02): : 900 - 921
  • [2] On the time-splitting scheme used in the Princeton Ocean Model
    Kamenkovich, V. M.
    Nechaev, D. A.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2009, 228 (08) : 2874 - 2905
  • [3] Convergence of a semiclassical wavepacket based time-splitting for the Schrodinger equation
    Gradinaru, Vasile
    Hagedorn, George A.
    NUMERISCHE MATHEMATIK, 2014, 126 (01) : 53 - 73
  • [4] An efficient time-splitting approximation of the Navier-Stokes equations with LPS modeling
    Rubino, Samuele
    APPLIED MATHEMATICS AND COMPUTATION, 2019, 348 : 318 - 337
  • [5] A finite-volume-based time-splitting scheme for computation of electrodeposition
    Xia, CM
    Murthy, JY
    NUMERICAL HEAT TRANSFER PART B-FUNDAMENTALS, 2003, 44 (04) : 309 - 328
  • [6] Convergence of a semiclassical wavepacket based time-splitting for the Schrödinger equation
    Vasile Gradinaru
    George A. Hagedorn
    Numerische Mathematik, 2014, 126 : 53 - 73
  • [7] Model-based Kernel for Efficient Time Series Analysis
    Chen, Huanhuan
    Tang, Fengzhen
    Tino, Peter
    Yao, Xin
    19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 392 - 400
  • [9] An efficient time-splitting compact finite difference method for Gross-Pitaevskii equation
    Wang, Hanquan
    Ma, Xiu
    Lu, Junliang
    Gao, Wen
    APPLIED MATHEMATICS AND COMPUTATION, 2017, 297 : 131 - 144
  • [10] A semi-implicit time-splitting scheme for a regional nonhydrostatic atmospheric model
    Bourchtein, Andrei
    Bourchtein, Ludmila
    COMPUTER PHYSICS COMMUNICATIONS, 2012, 183 (03) : 570 - 587