Evolving Interpretable Visual Classifiers with Large Language Models

被引:0
|
作者
Chiquier, Mia [1 ]
Mall, Utkarsh [1 ]
Vondrick, Carl [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
来源
基金
美国国家科学基金会;
关键词
Visual Recognition; Interpretable Representations;
D O I
10.1007/978-3-031-73039-9_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance. However, vision-language models, which compute similarity scores between images and class labels, are largely black-box, with limited interpretability, risk for bias, and inability to discover new visual concepts not written down. Moreover, in practical settings, the vocabulary for class names and attributes of specialized concepts will not be known, preventing these methods from performing well on images uncommon in large-scale vision-language datasets. To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition. We introduce an evolutionary search algorithm that uses the in-context learning abilities of large language models to iteratively mutate a concept bottleneck of attributes for classification. Our method produces state-of-the-art, interpretable finegrained classifiers. We outperform the baselines by 18.4% on five finegrained iNaturalist datasets and by 22.2% on two KikiBouba datasets, despite the baselines having access to privileged information.
引用
收藏
页码:183 / 201
页数:19
相关论文
共 50 条
  • [1] Evolving Artificial Datasets to Improve Interpretable Classifiers
    Mayo, Michael
    Sun, Quan
    2014 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2014, : 2367 - 2374
  • [2] Augmenting interpretable models with large language models during training
    Singh, Chandan
    Askari, Armin
    Caruana, Rich
    Gao, Jianfeng
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [3] Augmenting interpretable models with large language models during training
    Chandan Singh
    Armin Askari
    Rich Caruana
    Jianfeng Gao
    Nature Communications, 14
  • [4] Towards Interpretable Mental Health Analysis with Large Language Models
    Yang, Kailai
    Ji, Shaoxiong
    Zhang, Tianlin
    Xie, Qianqian
    Kuang, Ziyan
    Ananiadou, Sophia
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6056 - 6077
  • [5] An Embodied Approach for Evolving Robust Visual Classifiers
    Zieba, Karol
    Bongard, Josh
    GECCO'15: PROCEEDINGS OF THE 2015 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2015, : 201 - 208
  • [6] Large Language Models are Visual Reasoning Coordinators
    Chen, Liangyu
    Li, Bo
    Shen, Sheng
    Yang, Jingkang
    Li, Chunyuan
    Keutzer, Kurt
    Darrell, Trevor
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Visual cognition in multimodal large language models
    Buschoff, Luca M. Schulze
    Akata, Elif
    Bethge, Matthias
    Schulz, Eric
    NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 96 - 106
  • [8] Large Language Models-Based Local Explanations of Text Classifiers
    Angiulli, Fabrizio
    De Luca, Francesco
    Fassetti, Fabio
    Nistico, Simona
    DISCOVERY SCIENCE, DS 2024, PT I, 2025, 15243 : 19 - 35
  • [9] Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies
    Liu, Yilun
    Tao, Shimin
    Meng, Weibin
    Wang, Jingyu
    Ma, Wenbing
    Chen, Yuhang
    Zhao, Yanqing
    Yang, Hao
    Jiang, Yanfei
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 35 - 46
  • [10] Human-interpretable clustering of short text using large language models
    Miller, Justin K.
    Alexander, Tristram J.
    ROYAL SOCIETY OPEN SCIENCE, 2025, 12 (01):