MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

被引:0
|
作者
Farina, Matteo [1 ]
Mancini, Massimiliano [1 ]
Cunegatti, Elia [1 ]
Liu, Gaowen [2 ]
Iacca, Giovanni [1 ]
Ricci, Elisa [1 ,3 ]
机构
[1] Univ Trento, Trento, Italy
[2] Cisco Res, Res Triangle Pk, NC USA
[3] Fdn Bruno Kessler, Povo, Italy
关键词
D O I
10.1109/CVPR52733.2024.01532
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While excellent in transfer learning, Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic Vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neu-rons it connects; and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA-VLP. The code is publicly available at https://github.com/FarinaMatteo/multiflow.
引用
收藏
页码:16185 / 16195
页数:11
相关论文
共 49 条
  • [21] Federated Split Vision Transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training
    Park, Sangjoon
    Kim, Gwanghyun
    Kim, Jeongsol
    Kim, Boah
    Ye, Jong Chul
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [22] Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models
    Lv, Yiqiang
    Chen, Jingjing
    Wei, Zhipeng
    Chen, Kai
    Wu, Zuxuan
    Jiang, Yu-Gang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2831 - 2836
  • [23] Towards Multimodal Disinformation Detection by Vision-language Knowledge Interaction
    Li, Qilei
    Gao, Mingliang
    Zhang, Guisheng
    Zhai, Wenzhe
    Chen, Jinyong
    Jeon, Gwanggil
    INFORMATION FUSION, 2024, 102
  • [24] VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
    Zhou, Wangchunshu
    Zeng, Yan
    Diao, Shizhe
    Zhang, Xinsong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [25] IVTP: Instruction-Guided Visual Token Pruning for Large Vision-Language Models
    Huang, Kai
    Zou, Hao
    Xi, Ye
    Wang, BoChen
    Xie, Zhen
    Yu, Liang
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 214 - 230
  • [26] MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning
    Gao, Yuan
    Bai, Haoping
    Jie, Zequn
    Ma, Jiayi
    Jia, Kui
    Liu, Wei
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 11540 - 11549
  • [27] From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
    Ge, Jiaxin
    Subramanian, Sanjay
    Darrell, Trevor
    Li, Boyi
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1173 - 1185
  • [28] Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
    Palit, Vedant
    Pandey, Rohan
    Arora, Aryaman
    Liang, Paul Pu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2848 - 2853
  • [29] Towards Adversarial Attack on Vision-Language Pre-training Models
    Zhang, Jiaming
    Yi, Qi
    Sang, Jitao
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5005 - 5013
  • [30] Task-to-Instance Prompt Learning for Vision-Language Models at Test Time
    Lu, Zhihe
    Bai, Jiawang
    Li, Xin
    Xiao, Zeyu
    Wang, Xinchao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1908 - 1920