Efficient Attention: Attention with Linear Complexities

被引：285

作者：

Shen Zhuoran ^{[1
]}

Zhang Mingyuan ^{[2
]}

Zhao Haiyu ^{[2
]}

Yi Shuai ^{[2
]}

Li Hongsheng ^{[3
]}

机构：

[1] 4244 Univ Way NE 85406, Seattle, WA 98105 USA

[2] SenseTime Int, 182 Cecil St,36-02 Frasers Tower, Singapore 069547, Singapore

[3] Chinese Univ Hong Kong, Sha Tin, Hong Kong, Peoples R China

来源：

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 | 2021年

关键词：

D O I：

10.1109/WACV48630.2021.00357

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dot-product attention has wide applications in computer vision and natural language processing. However, its memory and computational costs grow quadratically with the input size. Such growth prohibits its application on high-resolution inputs. To remedy this drawback, this paper proposes a novel efficient attention mechanism equivalent to dot-product attention but with substantially less memory and computational costs. Its resource efficiency allows more widespread and flexible integration of attention modules into a network, which leads to better accuracies. Empirical evaluations demonstrated the effectiveness of its advantages. Efficient attention modules brought significant performance boosts to object detectors and instance segmenters on MS-COCO 2017. Further, the resource efficiency democratizes attention to complex models, where high costs prohibit the use of dot-product attention. As an exemplar, a model with efficient attention achieved state-of-the-art accuracies for stereo depth estimation on the Scene Flow dataset. Code is available at https://github.com/cmsflash/efficient-attention.

引用

页码：3530 / 3538

页数：9

共 50 条

[41] Generalizable Multi-Linear Attention Network
Jin, Tao
Zhao, Zhou
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[42] Linear normalization attention neural Hawkes process
Zhi-yan Song
Jian-wei Liu
Jie Yang
Lu-ning Zhang
Neural Computing and Applications, 2023, 35 : 1025 - 1039
[43] Linear Split Attention for Pavement Crack Detection
Yan, Guoliang
Ni, Chenyin
ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT II, 2022, 1701 : 66 - 80
[44] Prophet Attention: Predicting Attention with Future Attention
Liu, Fenglin
Ren, Xuancheng
Wu, Xian
Ge, Shen
Fan, Wei
Zou, Yuexian
Sun, Xu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[45] Triplet attention fusion module: A concise and efficient channel attention module for medical image segmentation
Wu, Yanlin
Wang, Guanglei
Wang, Zhongyang
Wang, Hongrui
Li, Yan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 82
[46] ATTENTION The focus of attention
Yates, Darran
NATURE REVIEWS NEUROSCIENCE, 2012, 13 (10) : 666 - 666
[47] Places: Invite attention, direct attention, reward attention, require attention. At their best, they sustain attention
Lyndon, D
PLACES-A FORUM OF ENVIRONMENTAL DESIGN, 1999, 12 (03): : 2 - 3
[48] 1-Scaled-attention: A novel fast attention mechanism for efficient modeling of protein sequences
Ranjan, Ashish
Fahad, Md Shah
Deepak, Akshay
INFORMATION SCIENCES, 2022, 609 : 1098 - 1112
[49] PatchFormer: An Efficient Point Transformer with Patch Attention
Zhang, Cheng
Wan, Haocheng
Shen, Xinyi
Wu, Zizhao
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11789 - 11798
[50] DataDAM: Efficient Dataset Distillation with Attention Matching
Sajedi, Ahmad
Khaki, Samir
Amjadian, Ehsan
Liu, Lucy Z.
Lawryshyn, Yuri A.
Plataniotis, Konstantinos N.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17051 - 17061

← 1 2 3 4 5 →