Towards an Exhaustive Evaluation of Vision-Language Foundation Models

被引：0

作者：

Salin, Emmanuelle ^{[1
]}

Ayache, Stephane ^{[1
]}

Favre, Benoit ^{[1
]}

机构：

[1] Univ Toulon & Var, Aix Marseille Univ, CNRS, LIS, Marseille, France

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW | 2023年

关键词：

D O I：

10.1109/ICCVW60793.2023.00041

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision-language foundation models have had considerable increase in performances in the last few years. However, there is still a lack comprehensive evaluation methods able to clearly explain their performances. We argue that a more systematic approach to foundation model evaluation would be beneficial to their use in real-world applications. In particular, we think that those models should be evaluated on a broad range of precise capabilities, in order to bring awareness to the width of their scope and their potential weaknesses. To that end, we propose a methodology to build a taxonomy of multimodal capabilities for vision-language foundation models. The proposed taxonomy is intended as a first step towards an exhaustive evaluation of vision-language foundation models.

引用

页码：339 / 352

页数：14

共 50 条

[1] Equivariant Similarity for Vision-Language Foundation Models
Wang, Tan
Lin, Kevin
Li, Linjie
Lin, Chung-Ching
Yang, Zhengyuan
Zhang, Hanwang
Liu, Zicheng
Wang, Lijuan
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11964 - 11974
[2] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Ma, Zixian
Hong, Jerry
Gul, Mustafa Omer
Ciandhi, Mona
Geo, Irena
krishna, Ranjay
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10910 - 10921
[3] Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Peng, Wenshuo
Zhang, Kaipeng
Yang, Yue
Zhang, Hao
Qiao, Yu
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4506 - 4514
[4] Vision-Language Models for Vision Tasks: A Survey
Zhang, Jingyi
Huang, Jiaxing
Jin, Sheng
Lu, Shijian
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
[5] Experiential Views: Towards Human Experience Evaluation of Designed Spaces using Vision-Language Models
Aseniero, Bon Adriel
Lee, Michael
Wang, Yi
Zhou, Qian
Shahmansouri, Nastaran
Goldstein, Rhys
[J]. EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
[6] Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
Zhou, Andy
Wang, Jindong
Wang, Yu-Xiong
Wang, Haohan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[7] Towards Adversarial Attack on Vision-Language Pre-training Models
Zhang, Jiaming
Yi, Qi
Sang, Jitao
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5005 - 5013
[8] Learning to Prompt for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
[9] Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
[J]. International Journal of Computer Vision, 2022, 130 : 2337 - 2348
[10] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
Du, Yuqing
Konyushkova, Ksenia
Denil, Misha
Raju, Akhil
Landon, Jessica
Hill, Felix
de Freitas, Nando
Cabi, Serkan
[J]. CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136

← 1 2 3 4 5 →