Grid-Attention: Enhancing Computational Efficiency of Large Vision Models Without Fine-Tuning

被引:0
|
作者
Li Pengyu [1 ]
Guo Tianchu [1 ]
Wang Biao [1 ]
Hua Xian-Sheng [1 ]
机构
[1] Terminus Grp, Terminus Labs, Beijing, Peoples R China
来源
关键词
D O I
10.1007/978-3-031-72973-7_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, transformer-based large vision models, e.g., the Segment Anything Model (SAM) and Stable Diffusion (SD), have achieved remarkable success in the computer vision field. However, the quartic complexity within the transformer's Multi-Head Attention (MHA) leads to substantial computational costs in these models whose inputs and outputs are high-resolution. Although several prior works attempted to alleviate this challenge, none have successfully reduced the complexity and latency of large vision models while preserving their remarkable capabilities without requiring enormous efforts and GPU hours to re-train or fine-tune themodels. To address the challenge, we propose a simple yet effective plug-and-play transformer block called Grid-Attention(GridAttn). The GridAttn integrates the proposed Grid Clustering module, Grid Distributing strategies, and Grid Recovering module with common MHA to enhance the large vision models' computational efficiency and preserve their performance without the need for re-training or fine-tuning their parameters. We conduct extensive experiments on recent high-resolution tasks, including zero-shot instance segmentation (SAM, Expedit-SAM), text-to-image generation (Stable Diffusion V2.1), and semantic segmentation (SegFormer B0-B5). The experiments demonstrate that: Without any training or fine-tuning, GridAttn reduces GFlops by the range of [4.6%, 16.1%] and GPU inference latency by [8.2%, 21.4%], all while achieving equivalent performance (the performance bias ratio is less than 1%). Furthermore, the experiments present that GridAttn can also be trained from scratch or fine-tuned with very slight fine-tuning costs, resulting in a significantly improved performance-efficiency tradeoff. As a recommendation, we encourage the community to incorporate our GridAttn whenever deploying a well-trained transformer directly, fine-tuning a pre-trained one, or training a new one from scratch. The source code will be released in https://github.com/pengyuLPY/GridAttn.
引用
收藏
页码:54 / 70
页数:17
相关论文
共 50 条
  • [41] Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning
    Brydinskyi, Vitalii
    Sabodashko, Dmytro
    Khoma, Yuriy
    Podpora, Michal
    Konovalov, Alexander
    Khoma, Volodymyr
    IEEE ACCESS, 2024, 12 : 116649 - 116656
  • [42] A Comparative Analysis of Instruction Fine-Tuning Large Language Models for Financial Text Classification
    Fatemi, Sorouralsadat
    Hu, Yuheng
    Mousavi, Maryam
    ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2025, 16 (01)
  • [43] Prompt Engineering or Fine-Tuning? A Case Study on Phishing Detection with Large Language Models
    Trad, Fouad
    Chehab, Ali
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (01): : 367 - 384
  • [44] Efficient Fine-Tuning Large Language Models for Knowledge-Aware Response Planning
    Minh Nguyen
    Kishan, K. C.
    Toan Nguyen
    Chadha, Ankit
    Thuy Vu
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II, 2023, 14170 : 593 - 611
  • [45] Leveraging error-assisted fine-tuning large language models for manufacturing excellence
    Xia, Liqiao
    Li, Chengxi
    Zhang, Canbin
    Liu, Shimin
    Zheng, Pai
    ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2024, 88
  • [46] Fine-Tuning Large Language Models for Radiation Oncology, a Highly Specialized Healthcare Domain
    Wang, P.
    Liu, Z.
    Li, Y.
    Holmes, J. M.
    Shu, P.
    Zhang, L.
    Li, X.
    Li, Q.
    Vora, S. A.
    Patel, S. H.
    Sio, T. T.
    Liu, T.
    Liu, W.
    MEDICAL PHYSICS, 2024, 51 (09) : 6590 - 6590
  • [47] Fine-Tuning Large Language Models for Radiation Oncology, A Specialized Health Care Domain
    Wang, P.
    Liu, Z.
    Li, Y.
    Holmes, J.
    Shu, P.
    Zhang, L.
    Li, X.
    Li, Q.
    Vora, S. A.
    Patel, S. H.
    Sio, T. T. W.
    Liu, T.
    Liu, W.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2024, 120 (02): : E664 - E664
  • [48] Fine-Tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code Review
    Yu, Yongda
    Rong, Guoping
    Shen, Haifeng
    Zhang, He
    Shao, Dong
    Wang, Min
    Wei, Zhao
    Xu, Yong
    Wang, Juhong
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (01)
  • [49] Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks
    Baker, Nermeen Abou
    Rohrschneider, David
    Handmann, Uwe
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (04): : 2783 - 2807
  • [50] An Empirical Study on Fine-tuning Large Language Models of Code for Automated Program Repair
    Huang, Kai
    Meng, Xiangxin
    Zhang, Jian
    Liu, Yang
    Wang, Wenjie
    Li, Shuhao
    Zhang, Yuqing
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1162 - 1174