Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

被引:0
|
作者
Song, Lin [1 ]
Chen, Yukang [3 ]
Yang, Shuai [2 ]
Ding, Xiaohan [1 ]
Ge, Yixiao [1 ]
Chen, Ying-Cong [2 ]
Shan, Ying [1 ]
机构
[1] Tencent AILab, Shenzhen, Peoples R China
[2] HKUST GZ, Guangzhou, Peoples R China
[3] CUHK, Hong Kong, Peoples R China
关键词
D O I
10.1109/CVPR52733.2024.01306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on the high computational complexity in Large Language Models (LLMs), a significant challenge in both natural language processing (NLP) and multi-modal tasks. We propose Low-Rank Approximation for Sparse Attention (LoRA-Sparse), an innovative approach that strategically reduces this complexity. LoRA-Sparse introduces low-rank linear projection layers for sparse attention approximation. It utilizes an order-mimic training methodology, which is crucial for efficiently approximating the self-attention mechanism in LLMs. We empirically show that sparse attention not only reduces computational demands, but also enhances model performance in both NLP and multi-modal tasks. This surprisingly shows that redundant attention in LLMs might be non-beneficial. We extensively validate LoRA-Sparse through rigorous empirical studies in both (NLP) and multi-modal tasks, demonstrating its effectiveness and general applicability. Based on LLaMA and LLaVA models, our methods can reduce more than half of the self-attention computation with even better performance than full-attention baselines.
引用
收藏
页码:13763 / 13773
页数:11
相关论文
共 50 条
  • [41] Partial Multi-Label Learning by Low-Rank and Sparse Decomposition
    Sun, Lijuan
    Feng, Songhe
    Wang, Tao
    Lang, Congyan
    Jin, Yi
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5016 - 5023
  • [42] Multi-Way Compressed Sensing for Sparse Low-Rank Tensors
    Sidiropoulos, Nicholas D.
    Kyrillidis, Anastasios
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (11) : 757 - 760
  • [43] Efficient Pretraining and Finetuning of Quantized LLMs with Low-rank Structure
    Liu, Xiao-Yang
    Zhang, Jie
    Wang, Guoxuan
    Tong, Weiqin
    Walid, Anwar
    2024 IEEE 44TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS 2024, 2024, : 300 - 311
  • [44] Robust multi-modal medical image fusion via anisotropic heat diffusion guided low-rank structural analysis
    Wang, Qingzheng
    Li, Shuai
    Qin, Hong
    Hao, Aimin
    INFORMATION FUSION, 2015, 26 : 103 - 121
  • [45] Compressed Self-Attention for Deep Metric Learning with Low-Rank Approximation
    Chen, Ziye
    Gong, Mingming
    Ge, Lingjuan
    Du, Bo
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2058 - 2064
  • [46] Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
    Xu, Shiyu
    Guo, Qingpei
    Yang, Ming
    Zhang, Shiliang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13838 - 13848
  • [47] Multiscale Decomposition in Low-Rank Approximation
    Abdolali, Maryam
    Rahmati, Mohammad
    IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (07) : 1015 - 1019
  • [48] SIMPLICIAL APPROXIMATION AND LOW-RANK TREES
    GILLET, H
    SHALEN, PB
    SKORA, RK
    COMMENTARII MATHEMATICI HELVETICI, 1991, 66 (04) : 521 - 540
  • [49] Enhanced Low-Rank Matrix Approximation
    Parekh, Ankit
    Selesnick, Ivan W.
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (04) : 493 - 497
  • [50] Modifiable low-rank approximation to a matrix
    Barlow, Jesse L.
    Erbay, Hasan
    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2009, 16 (10) : 833 - 860