Evolving Interpretable Visual Classifiers with Large Language Models

被引:0
|
作者
Chiquier, Mia [1 ]
Mall, Utkarsh [1 ]
Vondrick, Carl [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
来源
基金
美国国家科学基金会;
关键词
Visual Recognition; Interpretable Representations;
D O I
10.1007/978-3-031-73039-9_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance. However, vision-language models, which compute similarity scores between images and class labels, are largely black-box, with limited interpretability, risk for bias, and inability to discover new visual concepts not written down. Moreover, in practical settings, the vocabulary for class names and attributes of specialized concepts will not be known, preventing these methods from performing well on images uncommon in large-scale vision-language datasets. To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition. We introduce an evolutionary search algorithm that uses the in-context learning abilities of large language models to iteratively mutate a concept bottleneck of attributes for classification. Our method produces state-of-the-art, interpretable finegrained classifiers. We outperform the baselines by 18.4% on five finegrained iNaturalist datasets and by 22.2% on two KikiBouba datasets, despite the baselines having access to privileged information.
引用
收藏
页码:183 / 201
页数:19
相关论文
共 50 条
  • [31] Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
    Li, Xin
    Wu, Yunfei
    Jiang, Xinghua
    Guo, Zhihao
    Gong, Mingming
    Cao, Haoyu
    Liu, Yinsong
    Jiang, Deqiang
    Sun, Xing
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15546 - 15555
  • [32] Tiered tagging and combined language models classifiers
    Tufis, D
    TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 28 - 33
  • [33] Language Models as Emotional Classifiers for Textual Conversation
    Heaton, Connor T.
    Schwartz, David M.
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2918 - 2926
  • [34] Interpretable Cascade Classifiers with Abstention
    Clertant, Matthieu
    Sokolovska, Nataliya
    Chevaleyre, Yann
    Hanczar, Blaise
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [35] Evolving code with a large language model
    Hemberg, Erik
    Moskal, Stephen
    O'Reilly, Una-May
    GENETIC PROGRAMMING AND EVOLVABLE MACHINES, 2024, 25 (02)
  • [36] Contemporary Approaches in Evolving Language Models
    Oralbekova, Dina
    Mamyrbayev, Orken
    Othman, Mohamed
    Kassymova, Dinara
    Mukhsina, Kuralai
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [37] On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study
    Minh-Hao Van
    Verma, Prateek
    Wu, Xintao
    2024 IEEE/ACM CONFERENCE ON CONNECTED HEALTH: APPLICATIONS, SYSTEMS AND ENGINEERING TECHNOLOGIES, CHASE 2024, 2024, : 172 - 176
  • [38] High Efficiency Image Compression for Large Visual-Language Models
    Li, Binzhe
    Wang, Shurun
    Wang, Shiqi
    Ye, Yan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2870 - 2880
  • [39] ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
    Zhou, Kaiwen
    Lee, Kwonjoon
    Misu, Teruhisa
    Wang, Xin Eric
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 10783 - 10795
  • [40] JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
    Feng, Yingchaojie
    Chen, Zhizhang
    Kang, Zhining
    Wang, Sijia
    Zhu, Minfeng
    Zhang, Wei
    Chen, Wei
    arXiv,