Evolving Interpretable Visual Classifiers with Large Language Models

被引:0
|
作者
Chiquier, Mia [1 ]
Mall, Utkarsh [1 ]
Vondrick, Carl [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
来源
基金
美国国家科学基金会;
关键词
Visual Recognition; Interpretable Representations;
D O I
10.1007/978-3-031-73039-9_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance. However, vision-language models, which compute similarity scores between images and class labels, are largely black-box, with limited interpretability, risk for bias, and inability to discover new visual concepts not written down. Moreover, in practical settings, the vocabulary for class names and attributes of specialized concepts will not be known, preventing these methods from performing well on images uncommon in large-scale vision-language datasets. To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition. We introduce an evolutionary search algorithm that uses the in-context learning abilities of large language models to iteratively mutate a concept bottleneck of attributes for classification. Our method produces state-of-the-art, interpretable finegrained classifiers. We outperform the baselines by 18.4% on five finegrained iNaturalist datasets and by 22.2% on two KikiBouba datasets, despite the baselines having access to privileged information.
引用
收藏
页码:183 / 201
页数:19
相关论文
共 50 条
  • [21] Visual Adversarial Examples Jailbreak Aligned Large Language Models
    Qi, Xiangyu
    Huang, Kaixuan
    Panda, Ashwinee
    Henderson, Peter
    Wang, Mengdi
    Mittal, Prateek
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21527 - 21536
  • [22] Visual Adversarial Examples Jailbreak Aligned Large Language Models
    Princeton University, United States
    Proc. AAAI Conf. Artif. Intell., 19 (21527-21536):
  • [23] LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
    Feng, Weixi
    Zhu, Wanrong
    Fu, Tsu-jui
    Jampani, Varun
    Akula, Arjun
    He, Xuehai
    Basu, Sugato
    Wang, Xin Eric
    Wang, William Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [24] LEVA: Using Large Language Models to Enhance Visual Analytics
    Zhao, Yuheng
    Zhang, Yixing
    Zhang, Yu
    Zhao, Xinyi
    Wang, Junjie
    Shao, Zekai
    Turkay, Cagatay
    Chen, Siming
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2025, 31 (03) : 1830 - 1847
  • [25] Learning Interpretable Models in the Property Specification Language
    Roy, Rajarshi
    Fisman, Dana
    Neider, Daniel
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2213 - 2219
  • [26] Genetic Programming for Evolving a Front of Interpretable Models for Data Visualization
    Lensen, Andrew
    Xue, Bing
    Zhang, Mengjie
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (11) : 5468 - 5482
  • [27] Evolving structure and parameters of fuzzy models with interpretable membership functions
    Kim, MS
    Kim, CH
    Lee, JJ
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2005, 16 (02) : 95 - 105
  • [28] Large scale classifiers for visual classification tasks
    Thanh-Nghi Doan
    Thanh-Nghi Do
    Poulet, Francois
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (04) : 1199 - 1224
  • [29] Large scale classifiers for visual classification tasks
    Thanh-Nghi Doan
    Thanh-Nghi Do
    François Poulet
    Multimedia Tools and Applications, 2015, 74 : 1199 - 1224
  • [30] Exploiting Large Language Models for Enhanced Review Classification Explanations Through Interpretable and Multidimensional Analysis
    Cosentino, Cristian
    Gunduz-Cure, Merve
    Marozzo, Fabrizio
    Ozturk-Birim, Sule
    DISCOVERY SCIENCE, DS 2024, PT I, 2025, 15243 : 3 - 18