Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

被引：0

作者：

Song, Lin ^{[1
]}

Chen, Yukang ^{[3
]}

Yang, Shuai ^{[2
]}

Ding, Xiaohan ^{[1
]}

Ge, Yixiao ^{[1
]}

Chen, Ying-Cong ^{[2
]}

Shan, Ying ^{[1
]}

机构：

[1] Tencent AILab, Shenzhen, Peoples R China

[2] HKUST GZ, Guangzhou, Peoples R China

[3] CUHK, Hong Kong, Peoples R China

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.01306

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on the high computational complexity in Large Language Models (LLMs), a significant challenge in both natural language processing (NLP) and multi-modal tasks. We propose Low-Rank Approximation for Sparse Attention (LoRA-Sparse), an innovative approach that strategically reduces this complexity. LoRA-Sparse introduces low-rank linear projection layers for sparse attention approximation. It utilizes an order-mimic training methodology, which is crucial for efficiently approximating the self-attention mechanism in LLMs. We empirically show that sparse attention not only reduces computational demands, but also enhances model performance in both NLP and multi-modal tasks. This surprisingly shows that redundant attention in LLMs might be non-beneficial. We extensively validate LoRA-Sparse through rigorous empirical studies in both (NLP) and multi-modal tasks, demonstrating its effectiveness and general applicability. Based on LLaMA and LLaVA models, our methods can reduce more than half of the self-attention computation with even better performance than full-attention baselines.

引用

页码：13763 / 13773

页数：11

共 50 条

[41] Partial Multi-Label Learning by Low-Rank and Sparse Decomposition
Sun, Lijuan
Feng, Songhe
Wang, Tao
Lang, Congyan
Jin, Yi
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5016 - 5023
[42] Multi-Way Compressed Sensing for Sparse Low-Rank Tensors
Sidiropoulos, Nicholas D.
Kyrillidis, Anastasios
IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (11) : 757 - 760
[43] Efficient Pretraining and Finetuning of Quantized LLMs with Low-rank Structure
Liu, Xiao-Yang
Zhang, Jie
Wang, Guoxuan
Tong, Weiqin
Walid, Anwar
2024 IEEE 44TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS 2024, 2024, : 300 - 311
[44] Robust multi-modal medical image fusion via anisotropic heat diffusion guided low-rank structural analysis
Wang, Qingzheng
Li, Shuai
Qin, Hong
Hao, Aimin
INFORMATION FUSION, 2015, 26 : 103 - 121
[45] Compressed Self-Attention for Deep Metric Learning with Low-Rank Approximation
Chen, Ziye
Gong, Mingming
Ge, Lingjuan
Du, Bo
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2058 - 2064
[46] Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
Xu, Shiyu
Guo, Qingpei
Yang, Ming
Zhang, Shiliang
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13838 - 13848
[47] Multiscale Decomposition in Low-Rank Approximation
Abdolali, Maryam
Rahmati, Mohammad
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (07) : 1015 - 1019
[48] SIMPLICIAL APPROXIMATION AND LOW-RANK TREES
GILLET, H
SHALEN, PB
SKORA, RK
COMMENTARII MATHEMATICI HELVETICI, 1991, 66 (04) : 521 - 540
[49] Enhanced Low-Rank Matrix Approximation
Parekh, Ankit
Selesnick, Ivan W.
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (04) : 493 - 497
[50] Modifiable low-rank approximation to a matrix
Barlow, Jesse L.
Erbay, Hasan
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2009, 16 (10) : 833 - 860

← 1 2 3 4 5 →