MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

被引：0

作者：

Farina, Matteo ^{[1
]}

Mancini, Massimiliano ^{[1
]}

Cunegatti, Elia ^{[1
]}

Liu, Gaowen ^{[2
]}

Iacca, Giovanni ^{[1
]}

Ricci, Elisa ^{[1
,3
]}

机构：

[1] Univ Trento, Trento, Italy

[2] Cisco Res, Res Triangle Pk, NC USA

[3] Fdn Bruno Kessler, Povo, Italy

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.01532

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While excellent in transfer learning, Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic Vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neu-rons it connects; and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA-VLP. The code is publicly available at https://github.com/FarinaMatteo/multiflow.

引用

页码：16185 / 16195

页数：11

共 49 条

[21] Federated Split Vision Transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training
Park, Sangjoon
Kim, Gwanghyun
Kim, Jeongsol
Kim, Boah
Ye, Jong Chul
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[22] Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models
Lv, Yiqiang
Chen, Jingjing
Wei, Zhipeng
Chen, Kai
Wu, Zuxuan
Jiang, Yu-Gang
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2831 - 2836
[23] Towards Multimodal Disinformation Detection by Vision-language Knowledge Interaction
Li, Qilei
Gao, Mingliang
Zhang, Guisheng
Zhai, Wenzhe
Chen, Jinyong
Jeon, Gwanggil
INFORMATION FUSION, 2024, 102
[24] VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Zhou, Wangchunshu
Zeng, Yan
Diao, Shizhe
Zhang, Xinsong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[25] IVTP: Instruction-Guided Visual Token Pruning for Large Vision-Language Models
Huang, Kai
Zou, Hao
Xi, Ye
Wang, BoChen
Xie, Zhen
Yu, Liang
COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 214 - 230
[26] MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning
Gao, Yuan
Bai, Haoping
Jie, Zequn
Ma, Jiayi
Jia, Kui
Liu, Wei
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 11540 - 11549
[27] From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
Ge, Jiaxin
Subramanian, Sanjay
Darrell, Trevor
Li, Boyi
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1173 - 1185
[28] Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Palit, Vedant
Pandey, Rohan
Arora, Aryaman
Liang, Paul Pu
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2848 - 2853
[29] Towards Adversarial Attack on Vision-Language Pre-training Models
Zhang, Jiaming
Yi, Qi
Sang, Jitao
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5005 - 5013
[30] Task-to-Instance Prompt Learning for Vision-Language Models at Test Time
Lu, Zhihe
Bai, Jiawang
Li, Xin
Xiao, Zeyu
Wang, Xinchao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1908 - 1920

← 1 2 3 4 5 →