Beyond backpropagate through time: Efficient model-based training through time-splitting

被引：0

作者：

Gao, Jiaxin ^{[1
]}

Guan, Yang ^{[2
]}

Li, Wenyu ^{[3
]}

Li, Shengbo Eben ^{[2
]}

Ma, Fei ^{[1
]}

Zheng, Jianfeng ^{[4
]}

Wei, Junqing ^{[4
]}

Zhang, Bo ^{[4
]}

Li, Keqiang ^{[2
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing, Peoples R China

[2] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China

[3] Nankai Univ, Coll Artificial Intelligence, Tianjin, Peoples R China

[4] DiDi Chuxing, Urban Transportat Div, Beijing, Peoples R China

来源：

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS | 2022年 / 37卷 / 10期

基金：

美国国家科学基金会;

关键词：

model-based policy gradient; optimal control; parallel training; reinforcement learning; time-splitting; LEVEL; GAME; GO;

D O I：

10.1002/int.22928

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Model-based policy gradient (MBPG) has been employed to seek an approximate solution to the optimal control problem. However, there is coupling between adjacent states due to temporal dependencies, making the training time grow linearly with the time horizon. This paper reshapes the training process of MBPG with the time-splitting technique to establish a time-independent algorithm called Training Through Time-Splitting (T3S). First, copy the coupled variables to obtain two independent variables. Meanwhile, an extra variable together with an equivalence constraint is introduced for problem consistency. Then, the transformed problem divides into subproblems with carefully derived loss functions. Subproblems own decoupled variables and shared policy networks, which means they can be optimized concurrently. Guided by the algorithm design, this paper further proposes an asynchronous parallel training scheme to accelerate training efficiency. Numerical simulation shows that the T3S algorithm outperforms the MBPG algorithm by 83.6% in wall-clock time with a trajectory tracking task.

引用

页码：8046 / 8067

页数：22

共 50 条

[21] Semi-Lagrangian semi-implicit time-splitting scheme for a regional model of the atmosphere
Bourchtein, Andrei
Bourchtein, Ludmila
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2009, 227 (01) : 115 - 125
[22] Model-based testing through a GUI
Kervinen, Antti
Maunumaa, Mika
Paakkonen, Tuula
Katara, Mika
FORMAL APPROACHES TO SOFTWARE TESTING, 2006, 3997 : 16 - 31
[23] A first order projection-based time-splitting scheme for computing chemically reacting flows
Prohl, A
NUMERISCHE MATHEMATIK, 2000, 84 (04) : 649 - 677
[24] A first order projection-based time-splitting scheme for computing chemically reacting flows
Andreas Prohl
Numerische Mathematik, 2000, 84 : 649 - 677
[25] Model-based time series analysis of adaptive responses to aerobic training and detraining
Kuno-Mizumura, M
JAPANESE JOURNAL OF PHYSICAL FITNESS AND SPORTS MEDICINE, 2002, 51 (01) : 68 - 68
[26] Model-Based Time Series Classification
Kotsifakos, Alexios
Papapetrou, Panagiotis
ADVANCES IN INTELLIGENT DATA ANALYSIS XIII, 2014, 8819 : 179 - 191
[27] Model-based time series classification
Kotsifakos, Alexios
Papapetrou, Panagiotis
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8819 : 179 - 191
[28] A projection-based time-splitting algorithm for approximating nematic liquid crystal flows with stretching
Cabrales, Roberto C.
Guillen-Gonzalez, Francisco
Vicente Gutierrez-Santacreu, Juan
ZAMM-ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 2017, 97 (10): : 1204 - 1219
[29] Model-based green time optimisation
Shahin, Mohamed
Friedrich, Bernhard
Traffic Engineering and Control, 2013, 54 (01): : 23 - 28
[30] GAIN IN TIME THROUGH SPEED READING TRAINING
BRETSCHNEIDER, K
ZEITSCHRIFT FUR BETRIEBSWIRTSCHAFT, 1958, 28 (11): : 723 - 724

← 1 2 3 4 5 →