Grid-Attention: Enhancing Computational Efficiency of Large Vision Models Without Fine-Tuning

被引:0
|
作者
Li Pengyu [1 ]
Guo Tianchu [1 ]
Wang Biao [1 ]
Hua Xian-Sheng [1 ]
机构
[1] Terminus Grp, Terminus Labs, Beijing, Peoples R China
来源
关键词
D O I
10.1007/978-3-031-72973-7_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, transformer-based large vision models, e.g., the Segment Anything Model (SAM) and Stable Diffusion (SD), have achieved remarkable success in the computer vision field. However, the quartic complexity within the transformer's Multi-Head Attention (MHA) leads to substantial computational costs in these models whose inputs and outputs are high-resolution. Although several prior works attempted to alleviate this challenge, none have successfully reduced the complexity and latency of large vision models while preserving their remarkable capabilities without requiring enormous efforts and GPU hours to re-train or fine-tune themodels. To address the challenge, we propose a simple yet effective plug-and-play transformer block called Grid-Attention(GridAttn). The GridAttn integrates the proposed Grid Clustering module, Grid Distributing strategies, and Grid Recovering module with common MHA to enhance the large vision models' computational efficiency and preserve their performance without the need for re-training or fine-tuning their parameters. We conduct extensive experiments on recent high-resolution tasks, including zero-shot instance segmentation (SAM, Expedit-SAM), text-to-image generation (Stable Diffusion V2.1), and semantic segmentation (SegFormer B0-B5). The experiments demonstrate that: Without any training or fine-tuning, GridAttn reduces GFlops by the range of [4.6%, 16.1%] and GPU inference latency by [8.2%, 21.4%], all while achieving equivalent performance (the performance bias ratio is less than 1%). Furthermore, the experiments present that GridAttn can also be trained from scratch or fine-tuned with very slight fine-tuning costs, resulting in a significantly improved performance-efficiency tradeoff. As a recommendation, we encourage the community to incorporate our GridAttn whenever deploying a well-trained transformer directly, fine-tuning a pre-trained one, or training a new one from scratch. The source code will be released in https://github.com/pengyuLPY/GridAttn.
引用
收藏
页码:54 / 70
页数:17
相关论文
共 50 条
  • [31] Parameter-efficient fine-tuning in large language models: a survey of methodologies
    Luping Wang
    Sheng Chen
    Linnan Jiang
    Shu Pan
    Runze Cai
    Sen Yang
    Fei Yang
    Artificial Intelligence Review, 58 (8)
  • [32] Prompting or Fine-tuning? A Comparative Study of Large Language Models for Taxonomy Construction
    Chen, Boqi
    Yi, Fandi
    Varro, Daniel
    2023 ACM/IEEE INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION, MODELS-C, 2023, : 588 - 596
  • [33] Enhanced Discriminative Fine-Tuning of Large Language Models for Chinese Text Classification
    Song, Jinwang
    Zan, Hongying
    Zhang, Kunli
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 168 - 174
  • [34] Fine-tuning large neural language models for biomedical natural language processing
    Tinn, Robert
    Cheng, Hao
    Gu, Yu
    Usuyama, Naoto
    Liu, Xiaodong
    Naumann, Tristan
    Gao, Jianfeng
    Poon, Hoifung
    PATTERNS, 2023, 4 (04):
  • [35] Personalized Large Language Models through Parameter Efficient Fine-Tuning Techniques
    Braga, Marco
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 3076 - 3076
  • [36] CSAFT: Continuous Semantic Augmentation Fine-Tuning for Legal Large Language Models
    Li, Bo
    Fan, Shuang
    Huang, Jin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 293 - 307
  • [37] Selective privacy-preserving framework for large language models fine-tuning
    Wang, Teng
    Zhai, Lindong
    Yang, Tengfei
    Luo, Zhucheng
    Liu, Shuanggen
    INFORMATION SCIENCES, 2024, 678
  • [38] AN ATTENTION-BASED BACKEND ALLOWING EFFICIENT FINE-TUNING OF TRANSFORMER MODELS FOR SPEAKER VERIFICATION
    Peng, Junyi
    Plchot, Oldrich
    Stafylakis, Themos
    Mosner, Ladislav
    Burget, Lukas
    Cernocky, Jan
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 555 - 562
  • [39] Enhancing AAV Viability Prediction: A Generalizable Fine-Tuning Framework with Large Language Model xTrimoPGLM
    Yang, Qirong
    Zou, Diming
    Guo, Yucheng
    Xu, Chenrui
    Marsic, Damien
    Cai, Zhongshan
    Liu, Yawen
    Xu, Ziyao
    Qu, Vicky
    Garces, Fernando
    Greisen, Per
    Ji, Qingzhou
    Song, Le
    MOLECULAR THERAPY, 2024, 32 (04) : 693 - 694
  • [40] Enhancing Outcome Prediction By Fine-Tuning Contextual Language Models on Diverse Cancer Patient Notes
    Srinivas, B.
    Anil, S.
    Liu, B.
    Interian, Y.
    Chen, W.
    Lupo, J.
    Braunstein, S.
    Morin, O.
    Lin, H.
    MEDICAL PHYSICS, 2024, 51 (09) : 6590 - 6591