Rethink the Top-u Attention in Sparse Self-attention for Long Sequence Time-Series Forecasting

被引:0
|
作者
Meng, Xiangxu [1 ]
Li, Wei [1 ,2 ]
Gaber, Tarek [3 ]
Zhao, Zheng [1 ]
Chen, Chuhao [1 ]
机构
[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Harbin Engn Univ, Modeling & Emulat E Govt Natl Engn Lab, Harbin 150001, Peoples R China
[3] Univ Salford, Sch Sci Engn & Environm, Manchester, England
基金
中国国家自然科学基金;
关键词
Time-series; Top-u Attention; Long-tailed distribution; Sparse self-attention;
D O I
10.1007/978-3-031-44223-0_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Long time-series forecasting plays a crucial role in production and daily life, covering various areas such as electric power loads, stock trends and road traffic. Attention-based models have achieved significant performance advantages based on the long-term modelling capabilities of self-attention. However, regarding the criticized quadratic time complexity of the self-attention mechanism, most subsequent work has attempted to improve on it from the perspective of the sparse distribution of attention. In the main line of these works, we further investigate the position distribution of Top-u attention in the long-tail distribution of sparse attention and propose a two-stage self-attention mechanism, named ProphetAttention. Specifically, in the training phase, ProphetAttention memorizes the position of Top-u attention, and in the prediction phase, it uses the recorded position indices of Top-u attention to directly obtain Top-u attention for sparse attention computation, thereby avoiding the redundant computation of measuring Top-u attention. Results on four widely used real-world datasets demonstrate that ProphetAttention improves the prediction efficiency of long sequence time-series compared to the Informer model by approximately 17%-26% across all prediction horizons and significantly promotes prediction speed.
引用
下载
收藏
页码:256 / 267
页数:12
相关论文
共 50 条
  • [1] An improved self-attention for long-sequence time-series data forecasting with missing values
    Zhang, Zhi-cheng
    Wang, Yong
    Peng, Jian-jian
    Duan, Jun-ting
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (08): : 3921 - 3940
  • [2] An improved self-attention for long-sequence time-series data forecasting with missing values
    Zhi-cheng Zhang
    Yong Wang
    Jian-jian Peng
    Jun-ting Duan
    Neural Computing and Applications, 2024, 36 (8) : 3921 - 3940
  • [3] Enformer: Encoder-Based Sparse Periodic Self-Attention Time-Series Forecasting
    Wang, Na
    Zhao, Xianglian
    IEEE ACCESS, 2023, 11 : 112004 - 112014
  • [4] Sparse self-attention guided generative adversarial networks for time-series generation
    Nourhan Ahmed
    Lars Schmidt-Thieme
    International Journal of Data Science and Analytics, 2023, 16 : 421 - 434
  • [5] Sparse self-attention guided generative adversarial networks for time-series generation
    Ahmed, Nourhan
    Schmidt-Thieme, Lars
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2023, 16 (04) : 421 - 434
  • [6] Bridging Self-Attention and Time Series Decomposition for Periodic Forecasting
    Jiang, Song
    Syed, Tahin
    Zhu, Xuan
    Levy, Joshua
    Aronchik, Boris
    Sun, Yizhou
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3202 - 3211
  • [7] Time-Series Forecasting Through Contrastive Learning with a Two-Dimensional Self-attention Mechanism
    Jiang, Linling
    Zhang, Fan
    Zhang, Mingli
    Zhang, Caiming
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 147 - 165
  • [8] Evaluating the effectiveness of self-attention mechanism in tuberculosis time series forecasting
    Zhihong Lv
    Rui Sun
    Xin Liu
    Shuo Wang
    Xiaowei Guo
    Yuan Lv
    Min Yao
    Junhua Zhou
    BMC Infectious Diseases, 24 (1)
  • [9] DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting
    Huang, Siteng
    Wang, Donglin
    Wu, Xuehan
    Tang, Ao
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2129 - 2132
  • [10] AD-autoformer: decomposition transformers with attention distilling for long sequence time-series forecasting
    Cao, Danyang
    Zhang, Shuai
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (14): : 21128 - 21148