Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

被引:0
|
作者
Song, Lin [1 ]
Chen, Yukang [3 ]
Yang, Shuai [2 ]
Ding, Xiaohan [1 ]
Ge, Yixiao [1 ]
Chen, Ying-Cong [2 ]
Shan, Ying [1 ]
机构
[1] Tencent AILab, Shenzhen, Peoples R China
[2] HKUST GZ, Guangzhou, Peoples R China
[3] CUHK, Hong Kong, Peoples R China
关键词
D O I
10.1109/CVPR52733.2024.01306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on the high computational complexity in Large Language Models (LLMs), a significant challenge in both natural language processing (NLP) and multi-modal tasks. We propose Low-Rank Approximation for Sparse Attention (LoRA-Sparse), an innovative approach that strategically reduces this complexity. LoRA-Sparse introduces low-rank linear projection layers for sparse attention approximation. It utilizes an order-mimic training methodology, which is crucial for efficiently approximating the self-attention mechanism in LLMs. We empirically show that sparse attention not only reduces computational demands, but also enhances model performance in both NLP and multi-modal tasks. This surprisingly shows that redundant attention in LLMs might be non-beneficial. We extensively validate LoRA-Sparse through rigorous empirical studies in both (NLP) and multi-modal tasks, demonstrating its effectiveness and general applicability. Based on LLaMA and LLaVA models, our methods can reduce more than half of the self-attention computation with even better performance than full-attention baselines.
引用
收藏
页码:13763 / 13773
页数:11
相关论文
共 50 条
  • [21] Multi-view low-rank sparse subspace clustering
    Brbic, Maria
    Kopriva, Ivica
    PATTERN RECOGNITION, 2018, 73 : 247 - 258
  • [22] Sparse and low-rank representation for multi-label classification
    Zhi-Fen He
    Ming Yang
    Applied Intelligence, 2019, 49 : 1708 - 1723
  • [23] Sparse and low-rank representation for multi-label classification
    He, Zhi-Fen
    Yang, Ming
    APPLIED INTELLIGENCE, 2019, 49 (05) : 1708 - 1723
  • [24] Dynamical low-rank approximation
    Koch, Othmar
    Lubich, Christian
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2007, 29 (02) : 434 - 454
  • [25] From low-rank retractions to dynamical low-rank approximation and back
    Seguin, Axel
    Ceruti, Gianluca
    Kressner, Daniel
    BIT NUMERICAL MATHEMATICS, 2024, 64 (03)
  • [26] Multi-modal foreground detection via inter- and intra-modality-consistent low-rank separation
    Zheng, Aihua
    Ye, Naipeng
    Li, Chenglong
    Wang, Xiao
    Tang, Jin
    NEUROCOMPUTING, 2020, 371 (371) : 27 - 38
  • [27] Multi-Modal Medical Image Fusion Based on Improved Parameter Adaptive PCNN and Latent Low-Rank Representation
    Zirui Tang
    Xianchun Zhou
    Instrumentation, 2024, 11 (02) : 53 - 63
  • [28] Robust Low-Rank and Sparse Tensor Decomposition for Low-Rank Tensor Completion
    Shi, Yuqing
    Du, Shiqiang
    Wang, Weilan
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 7138 - 7143
  • [29] Low-Rank Bottleneck in Multi-head Attention Models
    Bhojanapalli, Srinadh
    Yun, Chulhee
    Rawat, Ankit Singh
    Reddi, Sashank
    Kumar, Sanjiv
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [30] Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding
    Zhang, Lefei
    Zhang, Qian
    Zhang, Liangpei
    Tao, Dacheng
    Huang, Xin
    Du, Bo
    PATTERN RECOGNITION, 2015, 48 (10) : 3102 - 3112