Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

被引:0
|
作者
Song, Lin [1 ]
Chen, Yukang [3 ]
Yang, Shuai [2 ]
Ding, Xiaohan [1 ]
Ge, Yixiao [1 ]
Chen, Ying-Cong [2 ]
Shan, Ying [1 ]
机构
[1] Tencent AILab, Shenzhen, Peoples R China
[2] HKUST GZ, Guangzhou, Peoples R China
[3] CUHK, Hong Kong, Peoples R China
关键词
D O I
10.1109/CVPR52733.2024.01306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on the high computational complexity in Large Language Models (LLMs), a significant challenge in both natural language processing (NLP) and multi-modal tasks. We propose Low-Rank Approximation for Sparse Attention (LoRA-Sparse), an innovative approach that strategically reduces this complexity. LoRA-Sparse introduces low-rank linear projection layers for sparse attention approximation. It utilizes an order-mimic training methodology, which is crucial for efficiently approximating the self-attention mechanism in LLMs. We empirically show that sparse attention not only reduces computational demands, but also enhances model performance in both NLP and multi-modal tasks. This surprisingly shows that redundant attention in LLMs might be non-beneficial. We extensively validate LoRA-Sparse through rigorous empirical studies in both (NLP) and multi-modal tasks, demonstrating its effectiveness and general applicability. Based on LLaMA and LLaVA models, our methods can reduce more than half of the self-attention computation with even better performance than full-attention baselines.
引用
收藏
页码:13763 / 13773
页数:11
相关论文
共 50 条
  • [1] Low-Rank and Joint Sparse Representations for Multi-Modal Recognition
    Zhang, Heng
    Patel, Vishal M.
    Chellappa, Rama
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (10) : 4741 - 4752
  • [2] Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
    Chen, Beidi
    Dao, Tri
    Winsor, Eric
    Song, Zhao
    Rudra, Atri
    Re, Christopher
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting
    Bae, Inhwan
    Oh, Jean
    Jeon, Hae-Gon
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9983 - 9995
  • [4] FACE RECOGNITION USING MULTI-MODAL LOW-RANK DICTIONARY LEARNING
    Foroughi, Homa
    Shakeri, Moein
    Ray, Nilanjan
    Zhang, Hong
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1082 - 1086
  • [5] A LOW-RANK TENSOR BAYESIAN FILTER FRAMEWORK FOR MULTI-MODAL ANALYSIS
    Yang, Wenbin
    Wang, Zijia
    Ni, Jiacheng
    Chen, Qiang
    Jia, Zhen
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3738 - 3742
  • [6] Low-rank approximation of tensors via sparse optimization
    Wang, Xiaofei
    Navasca, Carmeliza
    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2018, 25 (02)
  • [7] Sparse Low-Rank Matrix Approximation for Data Compression
    Hou, Junhui
    Chau, Lap-Pui
    Magnenat-Thalmann, Nadia
    He, Ying
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (05) : 1043 - 1054
  • [8] Gradient modulated contrastive distillation of low-rank multi-modal knowledge for disease diagnosis
    Xing, Xiaohan
    Chen, Zhen
    Hou, Yuenan
    Yuan, Yixuan
    MEDICAL IMAGE ANALYSIS, 2023, 88
  • [9] ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention
    Dass, Jyotikrishna
    Wu, Shang
    Shi, Huihong
    Li, Chaojian
    Ye, Zhifan
    Wang, Zhongfeng
    Lin, Yingyan
    2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 415 - 428
  • [10] Sparse Representation and Low-Rank Approximation for Sensor Signal Processing
    Zhu, Yanping
    Jiang, Aimin
    Liu, Xiaofeng
    Kwan, Hon Keung
    2017 IEEE 30TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2017,