Beyond backpropagate through time: Efficient model-based training through time-splitting

被引：0

作者：

Gao, Jiaxin ^{[1
]}

Guan, Yang ^{[2
]}

Li, Wenyu ^{[3
]}

Li, Shengbo Eben ^{[2
]}

Ma, Fei ^{[1
]}

Zheng, Jianfeng ^{[4
]}

Wei, Junqing ^{[4
]}

Zhang, Bo ^{[4
]}

Li, Keqiang ^{[2
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing, Peoples R China

[2] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China

[3] Nankai Univ, Coll Artificial Intelligence, Tianjin, Peoples R China

[4] DiDi Chuxing, Urban Transportat Div, Beijing, Peoples R China

来源：

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS | 2022年 / 37卷 / 10期

基金：

美国国家科学基金会;

关键词：

model-based policy gradient; optimal control; parallel training; reinforcement learning; time-splitting; LEVEL; GAME; GO;

D O I：

10.1002/int.22928

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Model-based policy gradient (MBPG) has been employed to seek an approximate solution to the optimal control problem. However, there is coupling between adjacent states due to temporal dependencies, making the training time grow linearly with the time horizon. This paper reshapes the training process of MBPG with the time-splitting technique to establish a time-independent algorithm called Training Through Time-Splitting (T3S). First, copy the coupled variables to obtain two independent variables. Meanwhile, an extra variable together with an equivalence constraint is introduced for problem consistency. Then, the transformed problem divides into subproblems with carefully derived loss functions. Subproblems own decoupled variables and shared policy networks, which means they can be optimized concurrently. Guided by the algorithm design, this paper further proposes an asynchronous parallel training scheme to accelerate training efficiency. Numerical simulation shows that the T3S algorithm outperforms the MBPG algorithm by 83.6% in wall-clock time with a trajectory tracking task.

引用

页码：8046 / 8067

页数：22

共 50 条

[1] AN EFFICIENT TIME-SPLITTING METHOD FOR THE EHRENFEST DYNAMICS
Fang, Di
Jin, Shi
Sparber, Christof
MULTISCALE MODELING & SIMULATION, 2018, 16 (02): : 900 - 921
[2] On the time-splitting scheme used in the Princeton Ocean Model
Kamenkovich, V. M.
Nechaev, D. A.
JOURNAL OF COMPUTATIONAL PHYSICS, 2009, 228 (08) : 2874 - 2905
[3] Convergence of a semiclassical wavepacket based time-splitting for the Schrodinger equation
Gradinaru, Vasile
Hagedorn, George A.
NUMERISCHE MATHEMATIK, 2014, 126 (01) : 53 - 73
[4] An efficient time-splitting approximation of the Navier-Stokes equations with LPS modeling
Rubino, Samuele
APPLIED MATHEMATICS AND COMPUTATION, 2019, 348 : 318 - 337
[5] A finite-volume-based time-splitting scheme for computation of electrodeposition
Xia, CM
Murthy, JY
NUMERICAL HEAT TRANSFER PART B-FUNDAMENTALS, 2003, 44 (04) : 309 - 328
[6] Convergence of a semiclassical wavepacket based time-splitting for the Schrödinger equation
Vasile Gradinaru
George A. Hagedorn
Numerische Mathematik, 2014, 126 : 53 - 73
[7] Model-based Kernel for Efficient Time Series Analysis
Chen, Huanhuan
Tang, Fengzhen
Tino, Peter
Yao, Xin
19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 392 - 400
[8] STABILITY ANALYSIS FOR MACCORMACK TIME-SPLITTING TECHNIQUE APPLIED TO A MODEL CONDUCTION PROBLEM
JONES, E
JOURNAL OF COMPUTATIONAL PHYSICS, 1979, 30 (03) : 389 - 406
[9] An efficient time-splitting compact finite difference method for Gross-Pitaevskii equation
Wang, Hanquan
Ma, Xiu
Lu, Junliang
Gao, Wen
APPLIED MATHEMATICS AND COMPUTATION, 2017, 297 : 131 - 144
[10] A semi-implicit time-splitting scheme for a regional nonhydrostatic atmospheric model
Bourchtein, Andrei
Bourchtein, Ludmila
COMPUTER PHYSICS COMMUNICATIONS, 2012, 183 (03) : 570 - 587

← 1 2 3 4 5 →