Adaptive Multi-Resolution Attention with Linear Complexity

被引:0
|
作者
Zhang, Yao [1 ]
Ma, Yunpu [1 ]
Seidl, Thomas [1 ]
Tresp, Volker [1 ,2 ]
机构
[1] Ludwig Maximilians Univ Munchen, Inst Informat, Munich, Germany
[2] Siemens AG, Corp Technol, Munich, Germany
关键词
D O I
10.1109/IJCNN54540.2023.10191567
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers have improved the state-of-the-art across numerous tasks in sequence modeling. Besides the quadratic computational and memory complexity with respect to the sequence length, the self-attention mechanism only processes information at the same scale, i.e., all attention heads are in the same resolution, resulting in the limited power of the Transformer. To remedy this, we propose a novel and efficient structure named Adaptive Multi-Resolution Attention (AdaMRA for short), which scales linearly to sequence length in terms of time and space. Specifically, we leverage a multi-resolution multihead attention mechanism, enabling attention heads to capture long-range contextual information in a coarse-to-fine fashion. Moreover, to capture the potential relations between query representation and clues of different attention granularities, we leave the decision of which resolution of attention to use to query, which further improves the model's capacity compared to the vanilla Transformer. In an effort to reduce complexity, we adopt kernel attention without degrading the performance. Extensive experiments demonstrate the effectiveness and efficiency of our model by achieving state-of-the-art speed-memory-accuracy trade-off. To facilitate AdaMRA utilization by the scientific community, the implementation will be made publicly available.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] A HYBRID RECONSTRUCTION STRATEGY FOR FULLY ADAPTIVE MULTI-RESOLUTION SCHEME
    Deng Feng
    Wu Yi-Zhao
    [J]. MODERN PHYSICS LETTERS B, 2009, 23 (03): : 333 - 336
  • [42] Image registration based on residual mixed attention and multi-resolution constraints
    Zhang, Mingna
    Lü, Xiaoqi
    Gu, Yu
    [J]. Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2022, 30 (10): : 1203 - 1216
  • [43] A multi-resolution convolutional attention network for efficient diabetic retinopathy classification
    Madarapu, Sandeep
    Ari, Samit
    Mahapatra, Kamalakanta
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2024, 117
  • [44] Multi-Resolution Deblurring
    McLaughlin, Michael J.
    Lin, En-Ui
    Ezekiel, Soundararajan
    Blasch, Erik
    Bubalo, Adnan
    Cornacchia, Maria
    Alford, Mark
    Thomas, Millicent
    [J]. 2014 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2014,
  • [45] Multi-scale Attention Aided Multi-Resolution Network for Human Pose Estimation
    Selvam, Srinika
    Mishra, Deepak
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2019, PT I, 2019, 11941 : 461 - 472
  • [46] MULTI-RESOLUTION RELAXATION
    NARAYANAN, KA
    OLEARY, DP
    ROSENFELD, A
    [J]. PATTERN RECOGNITION, 1983, 16 (02) : 223 - 230
  • [47] High-order finite volume multi-resolution WENO schemes with adaptive linear weights on triangular meshes
    Lin, Yicheng
    Zhu, Jun
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2024, 506
  • [48] On multi-resolution and variable-resolution
    Li, ZN
    [J]. INFORMATION INTELLIGENCE AND SYSTEMS, VOLS 1-4, 1996, : 719 - 724
  • [49] Multi-Resolution Supervision Network with an Adaptive Weighted Loss for Desert Segmentation
    Wang, Lexuan
    Weng, Liguo
    Xia, Min
    Liu, Jia
    Lin, Haifeng
    [J]. REMOTE SENSING, 2021, 13 (11)
  • [50] An adaptive multi-resolution algorithm for motion estimation in medical image sequences
    Grava, Cristian
    Gay-Bellile, Vincent
    Bartoli, Adrien
    Lavest, Jean-Marc
    Buzuloiu, Vasile
    [J]. 2007 EUROPEAN CONFERENCE ON CIRCUIT THEORY AND DESIGN, VOLS 1-3, 2007, : 507 - +