Efficient Attention: Attention with Linear Complexities

被引:285
|
作者
Shen Zhuoran [1 ]
Zhang Mingyuan [2 ]
Zhao Haiyu [2 ]
Yi Shuai [2 ]
Li Hongsheng [3 ]
机构
[1] 4244 Univ Way NE 85406, Seattle, WA 98105 USA
[2] SenseTime Int, 182 Cecil St,36-02 Frasers Tower, Singapore 069547, Singapore
[3] Chinese Univ Hong Kong, Sha Tin, Hong Kong, Peoples R China
关键词
D O I
10.1109/WACV48630.2021.00357
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dot-product attention has wide applications in computer vision and natural language processing. However, its memory and computational costs grow quadratically with the input size. Such growth prohibits its application on high-resolution inputs. To remedy this drawback, this paper proposes a novel efficient attention mechanism equivalent to dot-product attention but with substantially less memory and computational costs. Its resource efficiency allows more widespread and flexible integration of attention modules into a network, which leads to better accuracies. Empirical evaluations demonstrated the effectiveness of its advantages. Efficient attention modules brought significant performance boosts to object detectors and instance segmenters on MS-COCO 2017. Further, the resource efficiency democratizes attention to complex models, where high costs prohibit the use of dot-product attention. As an exemplar, a model with efficient attention achieved state-of-the-art accuracies for stereo depth estimation on the Scene Flow dataset. Code is available at https://github.com/cmsflash/efficient-attention.
引用
收藏
页码:3530 / 3538
页数:9
相关论文
共 50 条
  • [41] Generalizable Multi-Linear Attention Network
    Jin, Tao
    Zhao, Zhou
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [42] Linear normalization attention neural Hawkes process
    Zhi-yan Song
    Jian-wei Liu
    Jie Yang
    Lu-ning Zhang
    Neural Computing and Applications, 2023, 35 : 1025 - 1039
  • [43] Linear Split Attention for Pavement Crack Detection
    Yan, Guoliang
    Ni, Chenyin
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT II, 2022, 1701 : 66 - 80
  • [44] Prophet Attention: Predicting Attention with Future Attention
    Liu, Fenglin
    Ren, Xuancheng
    Wu, Xian
    Ge, Shen
    Fan, Wei
    Zou, Yuexian
    Sun, Xu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [45] Triplet attention fusion module: A concise and efficient channel attention module for medical image segmentation
    Wu, Yanlin
    Wang, Guanglei
    Wang, Zhongyang
    Wang, Hongrui
    Li, Yan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 82
  • [46] ATTENTION The focus of attention
    Yates, Darran
    NATURE REVIEWS NEUROSCIENCE, 2012, 13 (10) : 666 - 666
  • [47] Places: Invite attention, direct attention, reward attention, require attention. At their best, they sustain attention
    Lyndon, D
    PLACES-A FORUM OF ENVIRONMENTAL DESIGN, 1999, 12 (03): : 2 - 3
  • [48] 1-Scaled-attention: A novel fast attention mechanism for efficient modeling of protein sequences
    Ranjan, Ashish
    Fahad, Md Shah
    Deepak, Akshay
    INFORMATION SCIENCES, 2022, 609 : 1098 - 1112
  • [49] PatchFormer: An Efficient Point Transformer with Patch Attention
    Zhang, Cheng
    Wan, Haocheng
    Shen, Xinyi
    Wu, Zizhao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11789 - 11798
  • [50] DataDAM: Efficient Dataset Distillation with Attention Matching
    Sajedi, Ahmad
    Khaki, Samir
    Amjadian, Ehsan
    Liu, Lucy Z.
    Lawryshyn, Yuri A.
    Plataniotis, Konstantinos N.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17051 - 17061