FLatten Transformer: Vision Transformer using Focused Linear Attention

被引:66
|
作者
Han, Dongchen [1 ]
Pan, Xuran [1 ]
Han, Yizeng [1 ]
Song, Shiji [1 ]
Huang, Gao [1 ]
机构
[1] Tsinghua Univ, Dept Automat, BNRist, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.00548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear complexity by approximating the Softmax operation through carefully designed mapping functions. However, current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead from the mapping functions. In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. Specifically, we first analyze the factors contributing to the performance degradation of linear attention from two perspectives: the focus ability and feature diversity. To overcome these limitations, we introduce a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity. Extensive experiments show that our linear attention module is applicable to a variety of advanced vision Transformers, and achieves consistently improved performances on multiple benchmarks. Code is available at https://github. com/LeapLabTHU/FLatten-Transformer.
引用
收藏
页码:5938 / 5948
页数:11
相关论文
共 50 条
  • [31] ViTVO: Vision Transformer based Visual Odometry with Attention Supervision
    Chiu, Chu-Chi
    Yang, Hsuan-Kung
    Chen, Hao-Wei
    Chen, Yu-Wen
    Lee, Chun-Yi
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [32] Vision Transformer Based on Reconfigurable Gaussian Self-attention
    Zhao L.
    Zhou J.-K.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (09): : 1976 - 1988
  • [33] Patch Attacks on Vision Transformer via Skip Attention Gradients
    Deng, Haoyu
    Fang, Yanmei
    Huang, Fangjun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VIII, 2025, 15038 : 554 - 567
  • [34] Simultaneous Segmentation and Classification of Esophageal Lesions Using Attention Gating Pyramid Vision Transformer
    Ge, Peixuan
    Yan, Tao
    Wong, Pak Kin
    Li, Zheng
    Chan, In Neng
    Yu, Hon Ho
    Chan, Chon In
    Yao, Liang
    Hu, Ying
    Gao, Shan
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1961 - 1975
  • [35] FAM: Improving columnar vision transformer with feature attention mechanism
    Huang, Lan
    Bai, Xingyu
    Zeng, Jia
    Yu, Mengqiang
    Pang, Wei
    Wang, Kangping
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 242
  • [36] Efficient agricultural pest classification using vision transformer with hybrid pooled multihead attention
    Saranya T.
    Deisy C.
    Sridevi S.
    Computers in Biology and Medicine, 2024, 177
  • [37] PPLA-Transformer: An Efficient Transformer for Defect Detection with Linear Attention Based on Pyramid Pooling
    Song, Xiaona
    Tian, Yubo
    Liu, Haichao
    Wang, Lijun
    Niu, Jinxing
    SENSORS, 2025, 25 (03)
  • [38] A New Structure for "Sen" Transformer Using Three Winding Linear Transformer
    Lailypour, Chia
    Farsadi, Murteza
    2016 21ST CONFERENCE ON ELECTRICAL POWER DISTRIBUTION NETWORKS CONFERENCE (EPDC), 2016, : 5 - 10
  • [39] ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention
    Dass, Jyotikrishna
    Wu, Shang
    Shi, Huihong
    Li, Chaojian
    Ye, Zhifan
    Wang, Zhongfeng
    Lin, Yingyan
    2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 415 - 428
  • [40] Distinguishing Malicious Drones Using Vision Transformer
    Jamil, Sonain
    Abbas, Muhammad Sohail
    Roy, Arunabha M.
    AI, 2022, 3 (02) : 260 - 273