Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

被引:1
|
作者
Li, Xuanlin [1 ]
Fang, Yunhao [1 ]
Liu, Minghua [1 ]
Ling, Zhan [1 ]
Tu, Zhuowen [1 ]
Su, Hao [1 ]
机构
[1] Univ Calif San Diego, La Jolla, CA 92093 USA
关键词
D O I
10.1109/ICCV51070.2023.00236
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in visionlanguage alignment with the teacher; (2) by enriching the teacher's language representations with informative and fine-grained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary outof-distribution classification, highlighting the effectiveness of our proposed approaches. Code released at this link.
引用
收藏
页码:2492 / 2503
页数:12
相关论文
共 50 条
  • [1] Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
    Zhou, Andy
    Wang, Jindong
    Wang, Yu-Xiong
    Wang, Haohan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?
    Yifei Ming
    Yixuan Li
    [J]. International Journal of Computer Vision, 2024, 132 : 596 - 609
  • [3] How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?
    Ming, Yifei
    Li, Yixuan
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (02) : 596 - 609
  • [4] Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization
    Zhu, Lin
    Yin, Weihan
    Yang, Yiyao
    Wu, Fan
    Zeng, Zhaoyu
    Gu, Qinying
    Wang, Xinbing
    Zhou, Chenghu
    Ye, Nanyang
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3375 - 3407
  • [5] Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning
    Ding, Kun
    Zhang, Haojian
    Yu, Qiang
    Wang, Ying
    Xiang, Shiming
    Pan, Chunhong
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1528 - 1536
  • [6] A Stable Vision Transformer for Out-of-Distribution Generalization
    Yu, Haoran
    Liu, Baodi
    Wang, Yingjie
    Zhang, Kai
    Tao, Dapeng
    Liu, Weifeng
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 328 - 339
  • [7] Effectiveness assessment of recent large vision-language models
    Yao Jiang
    Xinyu Yan
    Ge-Peng Ji
    Keren Fu
    Meijun Sun
    Huan Xiong
    Deng-Ping Fan
    Fahad Shahbaz Khan
    [J]. Visual Intelligence, 2 (1):
  • [8] On Evaluating Adversarial Robustness of Large Vision-Language Models
    Zhao, Yunqing
    Pang, Tianyu
    Du, Chao
    Yang, Xiao
    Li, Chongxuan
    Cheung, Ngai-Man
    Lin, Min
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] MixPrompt: Enhancing Generalizability and Adversarial Robustness for Vision-Language Models via Prompt Fusion
    Fan, Hao
    Ma, Zhaoyang
    Li, Yong
    Tian, Rui
    Chen, Yunli
    Gao, Chenlong
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024, 2024, 14870 : 328 - 339
  • [10] Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification
    Xuan, Yunyi
    Chen, Weijie
    Yang, Shicai
    Xie, Di
    Lin, Luojun
    Zhuang, Yueting
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4928 - 4938