Grid-Attention: Enhancing Computational Efficiency of Large Vision Models Without Fine-Tuning

被引：0

作者：

Li Pengyu ^{[1
]}

Guo Tianchu ^{[1
]}

Wang Biao ^{[1
]}

Hua Xian-Sheng ^{[1
]}

机构：

[1] Terminus Grp, Terminus Labs, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT L | 2025年 / 15108卷

关键词：

D O I：

10.1007/978-3-031-72973-7_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, transformer-based large vision models, e.g., the Segment Anything Model (SAM) and Stable Diffusion (SD), have achieved remarkable success in the computer vision field. However, the quartic complexity within the transformer's Multi-Head Attention (MHA) leads to substantial computational costs in these models whose inputs and outputs are high-resolution. Although several prior works attempted to alleviate this challenge, none have successfully reduced the complexity and latency of large vision models while preserving their remarkable capabilities without requiring enormous efforts and GPU hours to re-train or fine-tune themodels. To address the challenge, we propose a simple yet effective plug-and-play transformer block called Grid-Attention(GridAttn). The GridAttn integrates the proposed Grid Clustering module, Grid Distributing strategies, and Grid Recovering module with common MHA to enhance the large vision models' computational efficiency and preserve their performance without the need for re-training or fine-tuning their parameters. We conduct extensive experiments on recent high-resolution tasks, including zero-shot instance segmentation (SAM, Expedit-SAM), text-to-image generation (Stable Diffusion V2.1), and semantic segmentation (SegFormer B0-B5). The experiments demonstrate that: Without any training or fine-tuning, GridAttn reduces GFlops by the range of [4.6%, 16.1%] and GPU inference latency by [8.2%, 21.4%], all while achieving equivalent performance (the performance bias ratio is less than 1%). Furthermore, the experiments present that GridAttn can also be trained from scratch or fine-tuned with very slight fine-tuning costs, resulting in a significantly improved performance-efficiency tradeoff. As a recommendation, we encourage the community to incorporate our GridAttn whenever deploying a well-trained transformer directly, fine-tuning a pre-trained one, or training a new one from scratch. The source code will be released in https://github.com/pengyuLPY/GridAttn.

引用

页码：54 / 70

页数：17

共 50 条

[41] Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning
Brydinskyi, Vitalii
Sabodashko, Dmytro
Khoma, Yuriy
Podpora, Michal
Konovalov, Alexander
Khoma, Volodymyr
IEEE ACCESS, 2024, 12 : 116649 - 116656
[42] A Comparative Analysis of Instruction Fine-Tuning Large Language Models for Financial Text Classification
Fatemi, Sorouralsadat
Hu, Yuheng
Mousavi, Maryam
ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2025, 16 (01)
[43] Prompt Engineering or Fine-Tuning? A Case Study on Phishing Detection with Large Language Models
Trad, Fouad
Chehab, Ali
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (01): : 367 - 384
[44] Efficient Fine-Tuning Large Language Models for Knowledge-Aware Response Planning
Minh Nguyen
Kishan, K. C.
Toan Nguyen
Chadha, Ankit
Thuy Vu
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II, 2023, 14170 : 593 - 611
[45] Leveraging error-assisted fine-tuning large language models for manufacturing excellence
Xia, Liqiao
Li, Chengxi
Zhang, Canbin
Liu, Shimin
Zheng, Pai
ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2024, 88
[46] Fine-Tuning Large Language Models for Radiation Oncology, a Highly Specialized Healthcare Domain
Wang, P.
Liu, Z.
Li, Y.
Holmes, J. M.
Shu, P.
Zhang, L.
Li, X.
Li, Q.
Vora, S. A.
Patel, S. H.
Sio, T. T.
Liu, T.
Liu, W.
MEDICAL PHYSICS, 2024, 51 (09) : 6590 - 6590
[47] Fine-Tuning Large Language Models for Radiation Oncology, A Specialized Health Care Domain
Wang, P.
Liu, Z.
Li, Y.
Holmes, J.
Shu, P.
Zhang, L.
Li, X.
Li, Q.
Vora, S. A.
Patel, S. H.
Sio, T. T. W.
Liu, T.
Liu, W.
INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2024, 120 (02): : E664 - E664
[48] Fine-Tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code Review
Yu, Yongda
Rong, Guoping
Shen, Haifeng
Zhang, He
Shao, Dong
Wang, Min
Wei, Zhao
Xu, Yong
Wang, Juhong
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (01)
[49] Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks
Baker, Nermeen Abou
Rohrschneider, David
Handmann, Uwe
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (04): : 2783 - 2807
[50] An Empirical Study on Fine-tuning Large Language Models of Code for Automated Program Repair
Huang, Kai
Meng, Xiangxin
Zhang, Jian
Liu, Yang
Wang, Wenjie
Li, Shuhao
Zhang, Yuqing
2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1162 - 1174

← 1 2 3 4 5 →