Comparison of Multi-Modal Large Language Models with Deep Learning Models for Medical Image Classification

被引:0
|
作者
Than, Joel Chia Ming [1 ]
Vong, Wan Tze [1 ]
Yong, Kelvin Sheng Chek [1 ]
机构
[1] Swinburne Univ Technol, Sch Informat Comp & Technol, Sarawak Campus, Kuching, Malaysia
关键词
LLM; multi-modal; deep learning; classification; image;
D O I
10.1109/ICSIPA62061.2024.10687159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the advancement of large language models (LLMs) such as GPT-4 and Gemini has opened new avenues for artificial intelligence applications in various domains, including medical image classification. This study aims to compare the performance of multi-modal LLMs with state-of-the-art deep learning networks in classifying tumour and non-tumour images. The performance of four multi-modal LLMs and four conventional deep learning methods were evaluated using several performance measures. The results demonstrate the strengths and limitations of both approaches, providing insights into their applicability and potential integration in clinical practice. Gemini 1.5 Pro performs the best out of the eight models evaluated. This comparison underscores the evolving role of AI in enhancing diagnostic accuracy and supporting medical professionals in disease detection especially when training data is scarce.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
    He, Liqi
    Li, Zuchao
    Cai, Xiantao
    Wang, Ping
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18180 - 18187
  • [32] Fine-grained multi-modal prompt learning for vision-language models
    Liu, Yunfei
    Deng, Yunziwei
    Liu, Anqi
    Liu, Yanan
    Li, Shengyang
    NEUROCOMPUTING, 2025, 636
  • [33] Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models
    Long, Sifan
    Zhao, Zhen
    Yuan, Junkun
    Tan, Zichang
    Liu, Jiangjiang
    Zhou, Luping
    Wang, Shengsheng
    Wang, Jingdong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21902 - 21912
  • [34] Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
    Li, Zhang
    Yang, Biao
    Liu, Qiang
    Ma, Zhiyin
    Zhang, Shuo
    Yang, Jingxu
    Sung, Yabo
    Liu, Yuliang
    Bai, Xiang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26753 - 26763
  • [35] Multi-modal Adapter for Medical Vision-and-Language Learning
    Yu, Zheng
    Qiao, Yanyuan
    Xie, Yutong
    Wu, Qi
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 393 - 402
  • [36] Multi-Modal Attribute Prompting for Vision-Language Models
    Liu, Xin
    Wu, Jiamin
    Yang, Wenfei
    Zhou, Xu
    Zhang, Tianzhu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11579 - 11591
  • [37] Demonstrating CAESURA: Language Models as Multi-Modal Query Planners
    Urban, Matthias
    Binnig, Carsten
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 472 - 475
  • [38] Multi-modal Language Models for Human-Robot Interaction
    Janssens, Ruben
    COMPANION OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024 COMPANION, 2024, : 109 - 111
  • [39] MMA: Multi-Modal Adapter for Vision-Language Models
    Yang, Lingxiao
    Zhang, Ru-Yuan
    Wang, Yanchen
    Xie, Xiaohua
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23826 - +
  • [40] Gait Image Classification Using Deep Learning Models for Medical Diagnosis
    Vasudevan, Pavitra
    Mattins, R. Faerie
    Srivarshan, S.
    Narayanan, Ashvath
    Wadhwani, Gayatri
    Parvathi, R.
    Maheswari, R.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 6039 - 6063