Grid-Attention: Enhancing Computational Efficiency of Large Vision Models Without Fine-Tuning

被引：0

作者：

Li Pengyu ^{[1
]}

Guo Tianchu ^{[1
]}

Wang Biao ^{[1
]}

Hua Xian-Sheng ^{[1
]}

机构：

[1] Terminus Grp, Terminus Labs, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT L | 2025年 / 15108卷

关键词：

D O I：

10.1007/978-3-031-72973-7_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, transformer-based large vision models, e.g., the Segment Anything Model (SAM) and Stable Diffusion (SD), have achieved remarkable success in the computer vision field. However, the quartic complexity within the transformer's Multi-Head Attention (MHA) leads to substantial computational costs in these models whose inputs and outputs are high-resolution. Although several prior works attempted to alleviate this challenge, none have successfully reduced the complexity and latency of large vision models while preserving their remarkable capabilities without requiring enormous efforts and GPU hours to re-train or fine-tune themodels. To address the challenge, we propose a simple yet effective plug-and-play transformer block called Grid-Attention(GridAttn). The GridAttn integrates the proposed Grid Clustering module, Grid Distributing strategies, and Grid Recovering module with common MHA to enhance the large vision models' computational efficiency and preserve their performance without the need for re-training or fine-tuning their parameters. We conduct extensive experiments on recent high-resolution tasks, including zero-shot instance segmentation (SAM, Expedit-SAM), text-to-image generation (Stable Diffusion V2.1), and semantic segmentation (SegFormer B0-B5). The experiments demonstrate that: Without any training or fine-tuning, GridAttn reduces GFlops by the range of [4.6%, 16.1%] and GPU inference latency by [8.2%, 21.4%], all while achieving equivalent performance (the performance bias ratio is less than 1%). Furthermore, the experiments present that GridAttn can also be trained from scratch or fine-tuned with very slight fine-tuning costs, resulting in a significantly improved performance-efficiency tradeoff. As a recommendation, we encourage the community to incorporate our GridAttn whenever deploying a well-trained transformer directly, fine-tuning a pre-trained one, or training a new one from scratch. The source code will be released in https://github.com/pengyuLPY/GridAttn.

引用

页码：54 / 70

页数：17

共 50 条

[21] Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Borzunov, Alexander
Ryabinin, Max
Chumachenko, Artem
Baranchuk, Dmitry
Dettmers, Tim
Belkada, Younes
Samygin, Pavel
Raffel, Colin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[22] Fine-Tuning Large Enterprise Language Models via Ontological Reasoning
Baldazzi, Teodoro
Bellomarini, Luigi
Ceri, Stefano
Colombo, Andrea
Gentili, Andrea
Sallinger, Emanuel
RULES AND REASONING, RULEML+RR 2023, 2023, 14244 : 86 - 94
[23] Enhancing the security of edge-AI runtime environments: a fine-tuning method based on large language models
Tang, Di
Xiao, Peng
Zheng, Tao
Li, Xiang
Yang, Cuibo
WIRELESS NETWORKS, 2025, 31 (02) : 1825 - 1838
[24] Repeatability of Fine-Tuning Large Language Models Illustrated Using QLoRA
Alahmari, Saeed S.
Hall, Lawrence O.
Mouton, Peter R.
Goldgof, Dmitry B.
IEEE ACCESS, 2024, 12 : 153221 - 153231
[25] Fine-tuning large language models for rare disease concept normalization
Wang, Andy
Liu, Cong
Yang, Jingye
Weng, Chunhua
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
[26] Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework
Hao, Sichong
Shi, Xianjun
Liu, Hongwei
Shu, Yanjun
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 136 - 146
[27] Parameter-efficient fine-tuning of large language models using semantic knowledge tuning
Prottasha, Nusrat Jahan
Mahmud, Asif
Sobuj, Md. Shohanur Islam
Bhat, Prakash
Kowsher, Md
Yousefi, Niloofar
Garibay, Ozlem Ozmen
SCIENTIFIC REPORTS, 2024, 14 (01):
[28] Enhancing breast ultrasound segmentation through fine-tuning and optimization techniques: Sharp attention UNet
Khaledyan, Donya
Marini, Thomas J.
Baran, Timothy M.
O'Connell, Avice
Parker, Kevin
PLOS ONE, 2023, 18 (12):
[29] A survey of efficient fine-tuning methods for Vision-Language Models - Prompt and Adapter
Xing, Jialu
Liu, Jianping
Wang, Jian
Sun, Lulu
Chen, Xi
Gu, Xunxun
Wang, Yingfei
COMPUTERS & GRAPHICS-UK, 2024, 119
[30] Enhancing generalization in camera trap image recognition: Fine-tuning visual language models
Yang, Zihe
Tian, Ye
Wang, Lifeng
Zhang, Junguo
NEUROCOMPUTING, 2025, 634

← 1 2 3 4 5 →