Beyond backpropagate through time: Efficient model-based training through time-splitting

被引:0
|
作者
Gao, Jiaxin [1 ]
Guan, Yang [2 ]
Li, Wenyu [3 ]
Li, Shengbo Eben [2 ]
Ma, Fei [1 ]
Zheng, Jianfeng [4 ]
Wei, Junqing [4 ]
Zhang, Bo [4 ]
Li, Keqiang [2 ]
机构
[1] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing, Peoples R China
[2] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China
[3] Nankai Univ, Coll Artificial Intelligence, Tianjin, Peoples R China
[4] DiDi Chuxing, Urban Transportat Div, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
model-based policy gradient; optimal control; parallel training; reinforcement learning; time-splitting; LEVEL; GAME; GO;
D O I
10.1002/int.22928
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based policy gradient (MBPG) has been employed to seek an approximate solution to the optimal control problem. However, there is coupling between adjacent states due to temporal dependencies, making the training time grow linearly with the time horizon. This paper reshapes the training process of MBPG with the time-splitting technique to establish a time-independent algorithm called Training Through Time-Splitting (T3S). First, copy the coupled variables to obtain two independent variables. Meanwhile, an extra variable together with an equivalence constraint is introduced for problem consistency. Then, the transformed problem divides into subproblems with carefully derived loss functions. Subproblems own decoupled variables and shared policy networks, which means they can be optimized concurrently. Guided by the algorithm design, this paper further proposes an asynchronous parallel training scheme to accelerate training efficiency. Numerical simulation shows that the T3S algorithm outperforms the MBPG algorithm by 83.6% in wall-clock time with a trajectory tracking task.
引用
收藏
页码:8046 / 8067
页数:22
相关论文
共 50 条
  • [21] Semi-Lagrangian semi-implicit time-splitting scheme for a regional model of the atmosphere
    Bourchtein, Andrei
    Bourchtein, Ludmila
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2009, 227 (01) : 115 - 125
  • [22] Model-based testing through a GUI
    Kervinen, Antti
    Maunumaa, Mika
    Paakkonen, Tuula
    Katara, Mika
    FORMAL APPROACHES TO SOFTWARE TESTING, 2006, 3997 : 16 - 31
  • [23] A first order projection-based time-splitting scheme for computing chemically reacting flows
    Prohl, A
    NUMERISCHE MATHEMATIK, 2000, 84 (04) : 649 - 677
  • [24] A first order projection-based time-splitting scheme for computing chemically reacting flows
    Andreas Prohl
    Numerische Mathematik, 2000, 84 : 649 - 677
  • [25] Model-based time series analysis of adaptive responses to aerobic training and detraining
    Kuno-Mizumura, M
    JAPANESE JOURNAL OF PHYSICAL FITNESS AND SPORTS MEDICINE, 2002, 51 (01) : 68 - 68
  • [26] Model-Based Time Series Classification
    Kotsifakos, Alexios
    Papapetrou, Panagiotis
    ADVANCES IN INTELLIGENT DATA ANALYSIS XIII, 2014, 8819 : 179 - 191
  • [27] Model-based time series classification
    Kotsifakos, Alexios
    Papapetrou, Panagiotis
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8819 : 179 - 191
  • [28] A projection-based time-splitting algorithm for approximating nematic liquid crystal flows with stretching
    Cabrales, Roberto C.
    Guillen-Gonzalez, Francisco
    Vicente Gutierrez-Santacreu, Juan
    ZAMM-ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 2017, 97 (10): : 1204 - 1219
  • [29] Model-based green time optimisation
    Shahin, Mohamed
    Friedrich, Bernhard
    Traffic Engineering and Control, 2013, 54 (01): : 23 - 28
  • [30] GAIN IN TIME THROUGH SPEED READING TRAINING
    BRETSCHNEIDER, K
    ZEITSCHRIFT FUR BETRIEBSWIRTSCHAFT, 1958, 28 (11): : 723 - 724