Multimodal Ensembling for Zero-Shot Image Classification

被引:0
|
作者
Hickmon, Javon [1 ]
机构
[1] Univ Washington, Dept Comp Sci, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial intelligence has made significant progress in image classification, an essential task for machine perception to achieve human-level image understanding. Despite recent advances in vision-language fields, multimodal image classification is still challenging, particularly for the following two reasons. First, models with low capacity often suffer from underfitting and thus underperform on fine-grained image classification. Second, it is important to ensure high-quality data with rich cross-modal representations of each class, which is often difficult to generate. Here, we utilize ensemble learning to reduce the impact of these issues on pre-trained models. We aim to create a meta-model that combines the predictions of multiple open-vocabulary multimodal models trained on different data to create more robust and accurate predictions. By utilizing ensemble learning and multimodal machine learning, we will achieve higher prediction accuracies without any additional training or fine-tuning, meaning that this method is completely zero-shot.
引用
收藏
页码:23747 / 23749
页数:3
相关论文
共 50 条
  • [21] Image-free Classifier Injection for Zero-Shot Classification
    Christensen, Anders
    Mancini, Massimiliano
    Koepke, A. Sophia
    Winther, Ole
    Akata, Zeynep
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19026 - 19035
  • [22] Hybrid Feature Approach for Enhancing Zero-Shot Image Classification
    Khanam, Shaista
    Sonar, Poonam N.
    ARTIFICIAL INTELLIGENCE AND KNOWLEDGE PROCESSING, AIKP 2024, 2025, 2228 : 239 - 251
  • [23] Generalized Zero-Shot Image Classification Based on Reconstruction Contrast
    Xu R.
    Shao S.
    Cao W.
    Liu B.
    Tao D.
    Liu W.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (12): : 1078 - 1088
  • [24] Underwater Sonar Image Classification with Image Disentanglement Reconstruction and Zero-Shot Learning
    Peng, Ye
    Li, Houpu
    Zhang, Wenwen
    Zhu, Junhui
    Liu, Lei
    Zhai, Guojun
    REMOTE SENSING, 2025, 17 (01)
  • [25] CLIPMulti: Explore the performance of multimodal enhanced CLIP for zero-shot text classification
    Wang, Peng
    Li, Dagang
    Hu, Xuesi
    Wang, Yongmei
    Zhang, Youhua
    COMPUTER SPEECH AND LANGUAGE, 2025, 90
  • [26] Zero-shot Learning Using Multimodal Descriptions
    Mall, Utkarsh
    Hariharan, Bharath
    Bala, Kavita
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3930 - 3938
  • [27] Zero-shot Generalization of Multimodal Dialogue Agents
    Tavares, Diogo
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 6935 - 6939
  • [28] Multimodal Zero-Shot Hateful Meme Detection
    Zhu, Jiawen
    Lee, Roy Ka-Wei
    Chong, Wen-Haw
    PROCEEDINGS OF THE 14TH ACM WEB SCIENCE CONFERENCE, WEBSCI 2022, 2022, : 382 - 389
  • [29] Chart question answering with multimodal graph representation learning and zero-shot classification
    Farahani, Ali Mazraeh
    Adibi, Peyman
    Ehsani, Mohammad Saeed
    Hutter, Hans-Peter
    Darvishy, Alireza
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 270
  • [30] A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
    Allingham, James Urquhart
    Ren, Jie
    Dusenberry, Michael W.
    Gu, Xiuye
    Cui, Yin
    Tran, Dustin
    Liu, Jeremiah Zhe
    Lakshminarayanan, Balaji
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202 : 547 - 568