MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

被引:0
|
作者
Farina, Matteo [1 ]
Mancini, Massimiliano [1 ]
Cunegatti, Elia [1 ]
Liu, Gaowen [2 ]
Iacca, Giovanni [1 ]
Ricci, Elisa [1 ,3 ]
机构
[1] Univ Trento, Trento, Italy
[2] Cisco Res, Res Triangle Pk, NC USA
[3] Fdn Bruno Kessler, Povo, Italy
关键词
D O I
10.1109/CVPR52733.2024.01532
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While excellent in transfer learning, Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic Vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neu-rons it connects; and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA-VLP. The code is publicly available at https://github.com/FarinaMatteo/multiflow.
引用
收藏
页码:16185 / 16195
页数:11
相关论文
共 49 条
  • [1] Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
    Gupta, Tanmay
    Kamath, Amita
    Kembhavi, Aniruddha
    Hoiem, Derek
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16378 - 16388
  • [2] Task-Agnostic Structured Pruning of Speech Representation Models
    Wang, Haoyu
    Wang, Siyuan
    Zhang, Wei-Qiang
    Suo, Hongbin
    Wan, Yulong
    INTERSPEECH 2023, 2023, : 231 - 235
  • [3] ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
    Lu, Jiasen
    Batra, Dhruv
    Parikh, Devi
    Lee, Stefan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
    Kim, Boah
    Kim, Jeongsol
    Ye, Jong Chul
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 203 - 218
  • [5] CodePrompt: Task-Agnostic Prefix Tuning for Program and Language Generation
    Choi, YunSeok
    Lee, Jee-Hyong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5282 - 5297
  • [6] Towards a Task-Agnostic Model of Difficulty Estimation for Supervised Learning Tasks
    Laverghetta, Antonio, Jr.
    Mirzakhalov, Jamshidbek
    Licato, John
    AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 16 - 23
  • [7] Towards a Task-agnostic Distillation Methodology for Creating Edge Foundation Models
    Dey, Swarnava
    Mukherjee, Arijit
    Ukil, Arijit
    Pal, Arpan
    PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 10 - 15
  • [8] Task Bias in Contrastive Vision-Language Models
    Menon, Sachit
    Chandratreya, Ishaan Preetam
    Vondrick, Carl
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (06) : 2026 - 2040
  • [9] Task Residual for Tuning Vision-Language Models
    Yu, Tao
    Lu, Zhihe
    Jin, Xin
    Chen, Zhibo
    Wang, Xinchao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10899 - 10909
  • [10] PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models
    Gong, Zhuocheng
    Liu, Jiahao
    Wang, Qifan
    Yang, Yang
    Wang, Jingang
    Wu, Wei
    Xiang, Yunsen
    Zhao, Dongyan
    Yan, Rui
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8065 - 8079