Grid-Attention: Enhancing Computational Efficiency of Large Vision Models Without Fine-Tuning

被引：0

作者：

Li Pengyu ^{[1
]}

Guo Tianchu ^{[1
]}

Wang Biao ^{[1
]}

Hua Xian-Sheng ^{[1
]}

机构：

[1] Terminus Grp, Terminus Labs, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT L | 2025年 / 15108卷

关键词：

D O I：

10.1007/978-3-031-72973-7_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, transformer-based large vision models, e.g., the Segment Anything Model (SAM) and Stable Diffusion (SD), have achieved remarkable success in the computer vision field. However, the quartic complexity within the transformer's Multi-Head Attention (MHA) leads to substantial computational costs in these models whose inputs and outputs are high-resolution. Although several prior works attempted to alleviate this challenge, none have successfully reduced the complexity and latency of large vision models while preserving their remarkable capabilities without requiring enormous efforts and GPU hours to re-train or fine-tune themodels. To address the challenge, we propose a simple yet effective plug-and-play transformer block called Grid-Attention(GridAttn). The GridAttn integrates the proposed Grid Clustering module, Grid Distributing strategies, and Grid Recovering module with common MHA to enhance the large vision models' computational efficiency and preserve their performance without the need for re-training or fine-tuning their parameters. We conduct extensive experiments on recent high-resolution tasks, including zero-shot instance segmentation (SAM, Expedit-SAM), text-to-image generation (Stable Diffusion V2.1), and semantic segmentation (SegFormer B0-B5). The experiments demonstrate that: Without any training or fine-tuning, GridAttn reduces GFlops by the range of [4.6%, 16.1%] and GPU inference latency by [8.2%, 21.4%], all while achieving equivalent performance (the performance bias ratio is less than 1%). Furthermore, the experiments present that GridAttn can also be trained from scratch or fine-tuned with very slight fine-tuning costs, resulting in a significantly improved performance-efficiency tradeoff. As a recommendation, we encourage the community to incorporate our GridAttn whenever deploying a well-trained transformer directly, fine-tuning a pre-trained one, or training a new one from scratch. The source code will be released in https://github.com/pengyuLPY/GridAttn.

引用

页码：54 / 70

页数：17

共 50 条

[31] Parameter-efficient fine-tuning in large language models: a survey of methodologies
Luping Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
Artificial Intelligence Review, 58 (8)
[32] Prompting or Fine-tuning? A Comparative Study of Large Language Models for Taxonomy Construction
Chen, Boqi
Yi, Fandi
Varro, Daniel
2023 ACM/IEEE INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION, MODELS-C, 2023, : 588 - 596
[33] Enhanced Discriminative Fine-Tuning of Large Language Models for Chinese Text Classification
Song, Jinwang
Zan, Hongying
Zhang, Kunli
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 168 - 174
[34] Fine-tuning large neural language models for biomedical natural language processing
Tinn, Robert
Cheng, Hao
Gu, Yu
Usuyama, Naoto
Liu, Xiaodong
Naumann, Tristan
Gao, Jianfeng
Poon, Hoifung
PATTERNS, 2023, 4 (04):
[35] Personalized Large Language Models through Parameter Efficient Fine-Tuning Techniques
Braga, Marco
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 3076 - 3076
[36] CSAFT: Continuous Semantic Augmentation Fine-Tuning for Legal Large Language Models
Li, Bo
Fan, Shuang
Huang, Jin
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 293 - 307
[37] Selective privacy-preserving framework for large language models fine-tuning
Wang, Teng
Zhai, Lindong
Yang, Tengfei
Luo, Zhucheng
Liu, Shuanggen
INFORMATION SCIENCES, 2024, 678
[38] AN ATTENTION-BASED BACKEND ALLOWING EFFICIENT FINE-TUNING OF TRANSFORMER MODELS FOR SPEAKER VERIFICATION
Peng, Junyi
Plchot, Oldrich
Stafylakis, Themos
Mosner, Ladislav
Burget, Lukas
Cernocky, Jan
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 555 - 562
[39] Enhancing AAV Viability Prediction: A Generalizable Fine-Tuning Framework with Large Language Model xTrimoPGLM
Yang, Qirong
Zou, Diming
Guo, Yucheng
Xu, Chenrui
Marsic, Damien
Cai, Zhongshan
Liu, Yawen
Xu, Ziyao
Qu, Vicky
Garces, Fernando
Greisen, Per
Ji, Qingzhou
Song, Le
MOLECULAR THERAPY, 2024, 32 (04) : 693 - 694
[40] Enhancing Outcome Prediction By Fine-Tuning Contextual Language Models on Diverse Cancer Patient Notes
Srinivas, B.
Anil, S.
Liu, B.
Interian, Y.
Chen, W.
Lupo, J.
Braunstein, S.
Morin, O.
Lin, H.
MEDICAL PHYSICS, 2024, 51 (09) : 6590 - 6591

← 1 2 3 4 5 →