Evolving Interpretable Visual Classifiers with Large Language Models

被引:0
|
作者
Chiquier, Mia [1 ]
Mall, Utkarsh [1 ]
Vondrick, Carl [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
来源
基金
美国国家科学基金会;
关键词
Visual Recognition; Interpretable Representations;
D O I
10.1007/978-3-031-73039-9_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance. However, vision-language models, which compute similarity scores between images and class labels, are largely black-box, with limited interpretability, risk for bias, and inability to discover new visual concepts not written down. Moreover, in practical settings, the vocabulary for class names and attributes of specialized concepts will not be known, preventing these methods from performing well on images uncommon in large-scale vision-language datasets. To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition. We introduce an evolutionary search algorithm that uses the in-context learning abilities of large language models to iteratively mutate a concept bottleneck of attributes for classification. Our method produces state-of-the-art, interpretable finegrained classifiers. We outperform the baselines by 18.4% on five finegrained iNaturalist datasets and by 22.2% on two KikiBouba datasets, despite the baselines having access to privileged information.
引用
收藏
页码:183 / 201
页数:19
相关论文
共 50 条
  • [41] Visual In-Context Learning for Large Vision-Language Models
    Zhou, Yucheng
    Le, Xiang
    Wang, Qianning
    Shen, Jianbing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902
  • [42] Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation
    Kritharoula, Anastasia
    Lymperaiou, Maria
    Stamou, Giorgos
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13053 - 13077
  • [43] Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
    Ma, Chuofan
    Jiang, Yi
    Wu, Jiannan
    Yuan, Zehuan
    Qi, Xiaojuan
    COMPUTER VISION - ECCV 2024, PT VI, 2025, 15064 : 417 - 435
  • [44] Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
    Khan, Zaid
    Kumar, Vijay B. G.
    Schulter, Samuel
    Fu, Yun
    Chandraker, Manmohan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14344 - 14353
  • [45] Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models
    Louis, Antoine
    van Dijck, Gijs
    Spanakis, Gerasimos
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22266 - 22275
  • [46] Personalized Classifiers: Evolving a Classifier from a Large Reference Knowledge Graph
    Bairi, Ramakrishna B.
    Ramakrishnan, Ganesh
    Sindhwani, Vikas
    PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14), 2014, : 132 - 141
  • [47] Scaling large margin classifiers for spoken language understanding
    Haffner, P
    SPEECH COMMUNICATION, 2006, 48 (3-4) : 239 - 261
  • [48] Large Language Models are Not Models of Natural Language: They are Corpus Models
    Veres, Csaba
    IEEE ACCESS, 2022, 10 : 61970 - 61979
  • [49] Augmenting Naive Bayes Classifiers with Statistical Language Models
    Fuchun Peng
    Dale Schuurmans
    Shaojun Wang
    Information Retrieval, 2004, 7 : 317 - 345
  • [50] Augmenting naive Bayes classifiers with statistical language models
    Peng, FC
    Schuurmans, D
    Wang, SJ
    INFORMATION RETRIEVAL, 2004, 7 (3-4): : 317 - 345