FLatten Transformer: Vision Transformer using Focused Linear Attention

被引:66
|
作者
Han, Dongchen [1 ]
Pan, Xuran [1 ]
Han, Yizeng [1 ]
Song, Shiji [1 ]
Huang, Gao [1 ]
机构
[1] Tsinghua Univ, Dept Automat, BNRist, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.00548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear complexity by approximating the Softmax operation through carefully designed mapping functions. However, current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead from the mapping functions. In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. Specifically, we first analyze the factors contributing to the performance degradation of linear attention from two perspectives: the focus ability and feature diversity. To overcome these limitations, we introduce a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity. Extensive experiments show that our linear attention module is applicable to a variety of advanced vision Transformers, and achieves consistently improved performances on multiple benchmarks. Code is available at https://github. com/LeapLabTHU/FLatten-Transformer.
引用
收藏
页码:5938 / 5948
页数:11
相关论文
共 50 条
  • [21] BViT: Broad Attention-Based Vision Transformer
    Li, Nannan
    Chen, Yaran
    Li, Weifan
    Ding, Zixiang
    Zhao, Dongbin
    Nie, Shuai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12772 - 12783
  • [22] Efficient image analysis with triple attention vision transformer
    Li, Gehui
    Zhao, Tongtong
    PATTERN RECOGNITION, 2024, 150
  • [23] Hyperspectral image classification with embedded linear vision transformer
    Tan, Yunfei
    Li, Ming
    Yuan, Longfa
    Shi, Chaoshan
    Luo, Yonghang
    Wen, Guihao
    EARTH SCIENCE INFORMATICS, 2025, 18 (01)
  • [24] FrFT-based estimation of linear and nonlinear impairments using Vision Transformer
    Jiang, Ting
    Gao, Zheng
    Chen, Yizhao
    Hu, Zihe
    Tang, Ming
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2024, 16 (03) : 419 - 431
  • [25] Transformer With Linear-Window Attention for Feature Matching
    Shen, Zhiwei
    Kong, Bin
    Dong, Xiaoyu
    IEEE ACCESS, 2023, 11 : 121202 - 121211
  • [26] ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
    He, Chenhang
    Li, Ruihuang
    Zhang, Guowen
    Zhang, Lei
    COMPUTER VISION - ECCV 2024, PT XXIX, 2025, 15087 : 74 - 92
  • [27] EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
    Liu, Xinyu
    Peng, Houwen
    Zheng, Ningxin
    Yang, Yuqing
    Hu, Han
    Yuan, Yixuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14420 - 14430
  • [28] BiFormer: Vision Transformer with Bi-Level Routing Attention
    Zhu, Lei
    Wang, Xinjiang
    Ke, Zhanghan
    Zhang, Wayne
    Lau, Rynson
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10323 - 10333
  • [29] SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
    Vani, Ankit
    Nguyen, Bac
    Lavoie, Samuel
    Krishna, Ranjay
    Courville, Aaron
    COMPUTER VISION - ECCV 2024, PT LXVI, 2025, 15124 : 233 - 251
  • [30] An Arrhythmia Classification Model Based on Vision Transformer with Deformable Attention
    Dong, Yanfang
    Zhang, Miao
    Qiu, Lishen
    Wang, Lirong
    Yu, Yong
    MICROMACHINES, 2023, 14 (06)