MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

被引:0
|
作者
Farina, Matteo [1 ]
Mancini, Massimiliano [1 ]
Cunegatti, Elia [1 ]
Liu, Gaowen [2 ]
Iacca, Giovanni [1 ]
Ricci, Elisa [1 ,3 ]
机构
[1] Univ Trento, Trento, Italy
[2] Cisco Res, Res Triangle Pk, NC USA
[3] Fdn Bruno Kessler, Povo, Italy
关键词
D O I
10.1109/CVPR52733.2024.01532
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While excellent in transfer learning, Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic Vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neu-rons it connects; and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA-VLP. The code is publicly available at https://github.com/FarinaMatteo/multiflow.
引用
收藏
页码:16185 / 16195
页数:11
相关论文
共 49 条
  • [41] Multi-Task Paired Masking With Alignment Modeling for Medical Vision-Language Pre-Training
    Zhang, Ke
    Yang, Yan
    Yu, Jun
    Jiang, Hanliang
    Fan, Jianping
    Huang, Qingming
    Han, Weidong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4706 - 4721
  • [42] Do AI Models "Like" Black Dogs? Towards Exploring Perceptions of Dogs with Vision-Language Models
    Feighelstein, Marcelo
    Kovalyo, Einat
    Abrams, Jennifer
    Byosiere, Sarah-Elisabeth
    Zamansky, Anna
    NINTH INTERNATIONAL CONFERENCE ON ANIMAL-COMPUTER INTERACTION, ACI 2022, 2022,
  • [43] Towards zero-shot human-object interaction detection via vision-language integration
    Xue, Weiying
    Liu, Qi
    Wang, Yuxiao
    Wei, Zhenao
    Xing, Xiaofen
    Xu, Xiangmin
    NEURAL NETWORKS, 2025, 187
  • [44] Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
    Xu, Jiaqi
    Wu, Mengyang
    Hu, Xiaowei
    Fu, Chi-Wing
    Dou, Qi
    Heng, Pheng-Ann
    COMPUTER VISION-ECCV 2024, PT XVIII, 2025, 15076 : 147 - 164
  • [45] Image caption generation via improved vision-language pre-training model: perception towards image retrieval
    Padate, Roshni
    Gupta, Ashutosh
    Kalla, Mukesh
    Sharma, Arvind
    IMAGING SCIENCE JOURNAL, 2025,
  • [46] Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
    Chen, Ruibo
    Wu, Yihan
    Chen, Lichang
    Liu, Guodong
    He, Qi
    Xiong, Tianyi
    Liu, Chenxi
    Guo, Junfeng
    Huang, Heng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4156 - 4172
  • [47] Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training
    Li, Zejun
    Fan, Zhihao
    Chen, JingJing
    Zhang, Qi
    Huang, Xuanjing
    Wei, Zhongyu
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 5939 - 5958
  • [48] SheffieldVeraAI at SemEval-2024 Task 4: Prompting and fine-tuning a Large Vision-Language Model for Binary Classification of Persuasion Techniques in Memes
    Grimshaw, Charlie
    Bontcheva, Kalina
    Song, Xingyi
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 2051 - 2056
  • [49] BERTastic at SemEval-2024 Task 4: State-of-the-Art Multilingual Propaganda Detection in Memes via Zero-Shot Learning with Vision-Language Models
    Mahmoud, Tarek
    Nakov, Preslav
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 503 - 510