Towards an Exhaustive Evaluation of Vision-Language Foundation Models

被引:0
|
作者
Salin, Emmanuelle [1 ]
Ayache, Stephane [1 ]
Favre, Benoit [1 ]
机构
[1] Univ Toulon & Var, Aix Marseille Univ, CNRS, LIS, Marseille, France
关键词
D O I
10.1109/ICCVW60793.2023.00041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-language foundation models have had considerable increase in performances in the last few years. However, there is still a lack comprehensive evaluation methods able to clearly explain their performances. We argue that a more systematic approach to foundation model evaluation would be beneficial to their use in real-world applications. In particular, we think that those models should be evaluated on a broad range of precise capabilities, in order to bring awareness to the width of their scope and their potential weaknesses. To that end, we propose a methodology to build a taxonomy of multimodal capabilities for vision-language foundation models. The proposed taxonomy is intended as a first step towards an exhaustive evaluation of vision-language foundation models.
引用
收藏
页码:339 / 352
页数:14
相关论文
共 50 条
  • [1] Equivariant Similarity for Vision-Language Foundation Models
    Wang, Tan
    Lin, Kevin
    Li, Linjie
    Lin, Chung-Ching
    Yang, Zhengyuan
    Zhang, Hanwang
    Liu, Zicheng
    Wang, Lijuan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11964 - 11974
  • [2] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
    Ma, Zixian
    Hong, Jerry
    Gul, Mustafa Omer
    Ciandhi, Mona
    Geo, Irena
    krishna, Ranjay
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10910 - 10921
  • [3] Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
    Peng, Wenshuo
    Zhang, Kaipeng
    Yang, Yue
    Zhang, Hao
    Qiao, Yu
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4506 - 4514
  • [4] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
  • [5] Experiential Views: Towards Human Experience Evaluation of Designed Spaces using Vision-Language Models
    Aseniero, Bon Adriel
    Lee, Michael
    Wang, Yi
    Zhou, Qian
    Shahmansouri, Nastaran
    Goldstein, Rhys
    [J]. EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
  • [6] Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
    Zhou, Andy
    Wang, Jindong
    Wang, Yu-Xiong
    Wang, Haohan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Towards Adversarial Attack on Vision-Language Pre-training Models
    Zhang, Jiaming
    Yi, Qi
    Sang, Jitao
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5005 - 5013
  • [8] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [9] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    [J]. International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [10] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
    Du, Yuqing
    Konyushkova, Ksenia
    Denil, Misha
    Raju, Akhil
    Landon, Jessica
    Hill, Felix
    de Freitas, Nando
    Cabi, Serkan
    [J]. CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136