FLatten Transformer: Vision Transformer using Focused Linear Attention

被引：66

作者：

Han, Dongchen ^{[1
]}

Pan, Xuran ^{[1
]}

Han, Yizeng ^{[1
]}

Song, Shiji ^{[1
]}

Huang, Gao ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Automat, BNRist, Beijing, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV51070.2023.00548

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear complexity by approximating the Softmax operation through carefully designed mapping functions. However, current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead from the mapping functions. In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. Specifically, we first analyze the factors contributing to the performance degradation of linear attention from two perspectives: the focus ability and feature diversity. To overcome these limitations, we introduce a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity. Extensive experiments show that our linear attention module is applicable to a variety of advanced vision Transformers, and achieves consistently improved performances on multiple benchmarks. Code is available at https://github. com/LeapLabTHU/FLatten-Transformer.

引用

页码：5938 / 5948

页数：11

共 50 条

[21] BViT: Broad Attention-Based Vision Transformer
Li, Nannan
Chen, Yaran
Li, Weifan
Ding, Zixiang
Zhao, Dongbin
Nie, Shuai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12772 - 12783
[22] Efficient image analysis with triple attention vision transformer
Li, Gehui
Zhao, Tongtong
PATTERN RECOGNITION, 2024, 150
[23] Hyperspectral image classification with embedded linear vision transformer
Tan, Yunfei
Li, Ming
Yuan, Longfa
Shi, Chaoshan
Luo, Yonghang
Wen, Guihao
EARTH SCIENCE INFORMATICS, 2025, 18 (01)
[24] FrFT-based estimation of linear and nonlinear impairments using Vision Transformer
Jiang, Ting
Gao, Zheng
Chen, Yizhao
Hu, Zihe
Tang, Ming
JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2024, 16 (03) : 419 - 431
[25] Transformer With Linear-Window Attention for Feature Matching
Shen, Zhiwei
Kong, Bin
Dong, Xiaoyu
IEEE ACCESS, 2023, 11 : 121202 - 121211
[26] ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
He, Chenhang
Li, Ruihuang
Zhang, Guowen
Zhang, Lei
COMPUTER VISION - ECCV 2024, PT XXIX, 2025, 15087 : 74 - 92
[27] EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
Liu, Xinyu
Peng, Houwen
Zheng, Ningxin
Yang, Yuqing
Hu, Han
Yuan, Yixuan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14420 - 14430
[28] BiFormer: Vision Transformer with Bi-Level Routing Attention
Zhu, Lei
Wang, Xinjiang
Ke, Zhanghan
Zhang, Wayne
Lau, Rynson
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10323 - 10333
[29] SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Vani, Ankit
Nguyen, Bac
Lavoie, Samuel
Krishna, Ranjay
Courville, Aaron
COMPUTER VISION - ECCV 2024, PT LXVI, 2025, 15124 : 233 - 251
[30] An Arrhythmia Classification Model Based on Vision Transformer with Deformable Attention
Dong, Yanfang
Zhang, Miao
Qiu, Lishen
Wang, Lirong
Yu, Yong
MICROMACHINES, 2023, 14 (06)

← 1 2 3 4 5 →