Fast training of a transformer for global multi-horizon time series forecasting on tensor processing units

被引:0
|
作者
Garcia-Nava, J-Luis [1 ]
Flores, Juan J. [1 ,2 ]
Tellez, Victor M. [1 ]
Calderon, Felix [1 ]
机构
[1] Univ Michoacana, Sch Elect Engn, Morelia 58030, Michoacan, Mexico
[2] Univ Oregon, Eugene, OR 97403 USA
来源
JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 08期
关键词
Deep learning; Self-attention; Cloud computing; TPU;
D O I
10.1007/s11227-022-05009-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Time Series Forecasting (TSF) is essential to key domains, and the Transformer neural network has advanced the state-of-the-art on global, multi-horizon TSF benchmarks. The quadratic time and memory complexity of the Vanilla Transformer (VT) hinders its application to Big Data environments; therefore, multiple efficient variants of the VT that lower complexity via sparse self-attention have been proposed. However, less complex algorithms do not directly produce faster executions, and machine learning models for Big Data are typically trained on accelerators designed for dense-matrix computation that render slower performance with sparse matrices. To better compare the accuracy-speed trade-off of the VT and its variants, it is essential to test them on such accelerators. We implemented a cloud-based VT on Tensor Processing Units to address this task. Experiments on large-scale datasets show that our Transformer achieves good predictive performance when compared to state-of-the-art models while reducing training times from hours to under 2 min.
引用
收藏
页码:8475 / 8498
页数:24
相关论文
共 38 条
  • [1] Fast training of a transformer for global multi-horizon time series forecasting on tensor processing units
    J.-Luis García-Nava
    Juan J. Flores
    Victor M. Tellez
    Felix Calderon
    [J]. The Journal of Supercomputing, 2023, 79 : 8475 - 8498
  • [2] Multi-Horizon Ternary Time Series Forecasting
    Htike, Zaw Zaw
    [J]. 2013 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2013, : 337 - 342
  • [3] Progressive neural network for multi-horizon time series forecasting
    Lin, Yang
    [J]. INFORMATION SCIENCES, 2024, 661
  • [4] Multi-Horizon Time Series Forecasting with Temporal Attention Learning
    Fan, Chenyou
    Zhang, Yuze
    Pan, Yi
    Li, Xiaoyue
    Zhang, Chi
    Yuan, Rong
    Wu, Di
    Wang, Wensheng
    Pei, Jian
    Huang, Heng
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2527 - 2535
  • [5] Multiplicative Attention Mechanism for Multi-horizon Time Series Forecasting
    Cui, Runpeng
    Wang, Jianqiang
    Wang, Zheng
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [6] Temporal Fusion Transformers for interpretable multi-horizon time series forecasting
    Lim, Bryan
    Arik, Sercan O.
    Loeff, Nicolas
    Pfister, Tomas
    [J]. INTERNATIONAL JOURNAL OF FORECASTING, 2021, 37 (04) : 1748 - 1764
  • [7] Hypertuned temporal fusion transformer for multi-horizon time series forecasting of dam level in hydroelectric power plants
    Stefenon, Stefano Frizzo
    Seman, Laio Oriel
    Silva, Luiza Scapinello Aquino da
    Mariani, Viviana Cocco
    Coelho, Leandro dos Santos
    [J]. INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2024, 157
  • [8] Multi-horizon Irradiation Forecasting For Mediterranean Locations Using Time Series Models
    Paoli, Christophe
    Voyant, Cyril
    Muselli, Marc
    Nivet, Marie-Laure
    [J]. 2013 ISES SOLAR WORLD CONGRESS, 2014, 57 : 1354 - 1363
  • [9] Multi-horizon solar radiation forecasting for Mediterranean locations using time series models
    Voyant, Cyril
    Paoli, Christophe
    Muselli, Marc
    Nivet, Marie-Laure
    [J]. RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2013, 28 : 44 - 52
  • [10] Dynamic Co-Attention Networks for multi-horizon forecasting in multivariate time series
    He, Xiaoyu
    Shi, Suixiang
    Geng, Xiulin
    Xu, Lingyu
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 135 : 72 - 84