Fast training of a transformer for global multi-horizon time series forecasting on tensor processing units

被引：0

作者：

Garcia-Nava, J-Luis ^{[1
]}

Flores, Juan J. ^{[1
,2
]}

Tellez, Victor M. ^{[1
]}

Calderon, Felix ^{[1
]}

机构：

[1] Univ Michoacana, Sch Elect Engn, Morelia 58030, Michoacan, Mexico

[2] Univ Oregon, Eugene, OR 97403 USA

来源：

JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 08期

关键词：

Deep learning; Self-attention; Cloud computing; TPU;

D O I：

10.1007/s11227-022-05009-x

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Time Series Forecasting (TSF) is essential to key domains, and the Transformer neural network has advanced the state-of-the-art on global, multi-horizon TSF benchmarks. The quadratic time and memory complexity of the Vanilla Transformer (VT) hinders its application to Big Data environments; therefore, multiple efficient variants of the VT that lower complexity via sparse self-attention have been proposed. However, less complex algorithms do not directly produce faster executions, and machine learning models for Big Data are typically trained on accelerators designed for dense-matrix computation that render slower performance with sparse matrices. To better compare the accuracy-speed trade-off of the VT and its variants, it is essential to test them on such accelerators. We implemented a cloud-based VT on Tensor Processing Units to address this task. Experiments on large-scale datasets show that our Transformer achieves good predictive performance when compared to state-of-the-art models while reducing training times from hours to under 2 min.

引用

页码：8475 / 8498

页数：24

共 38 条

[1] Fast training of a transformer for global multi-horizon time series forecasting on tensor processing units
J.-Luis García-Nava
Juan J. Flores
Victor M. Tellez
Felix Calderon
[J]. The Journal of Supercomputing, 2023, 79 : 8475 - 8498
[2] Multi-Horizon Ternary Time Series Forecasting
Htike, Zaw Zaw
[J]. 2013 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2013, : 337 - 342
[3] Progressive neural network for multi-horizon time series forecasting
Lin, Yang
[J]. INFORMATION SCIENCES, 2024, 661
[4] Multi-Horizon Time Series Forecasting with Temporal Attention Learning
Fan, Chenyou
Zhang, Yuze
Pan, Yi
Li, Xiaoyue
Zhang, Chi
Yuan, Rong
Wu, Di
Wang, Wensheng
Pei, Jian
Huang, Heng
[J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2527 - 2535
[5] Multiplicative Attention Mechanism for Multi-horizon Time Series Forecasting
Cui, Runpeng
Wang, Jianqiang
Wang, Zheng
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[6] Temporal Fusion Transformers for interpretable multi-horizon time series forecasting
Lim, Bryan
Arik, Sercan O.
Loeff, Nicolas
Pfister, Tomas
[J]. INTERNATIONAL JOURNAL OF FORECASTING, 2021, 37 (04) : 1748 - 1764
[7] Hypertuned temporal fusion transformer for multi-horizon time series forecasting of dam level in hydroelectric power plants
Stefenon, Stefano Frizzo
Seman, Laio Oriel
Silva, Luiza Scapinello Aquino da
Mariani, Viviana Cocco
Coelho, Leandro dos Santos
[J]. INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2024, 157
[8] Multi-horizon Irradiation Forecasting For Mediterranean Locations Using Time Series Models
Paoli, Christophe
Voyant, Cyril
Muselli, Marc
Nivet, Marie-Laure
[J]. 2013 ISES SOLAR WORLD CONGRESS, 2014, 57 : 1354 - 1363
[9] Multi-horizon solar radiation forecasting for Mediterranean locations using time series models
Voyant, Cyril
Paoli, Christophe
Muselli, Marc
Nivet, Marie-Laure
[J]. RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2013, 28 : 44 - 52
[10] Dynamic Co-Attention Networks for multi-horizon forecasting in multivariate time series
He, Xiaoyu
Shi, Suixiang
Geng, Xiulin
Xu, Lingyu
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 135 : 72 - 84

← 1 2 3 4 →