Comparison of Multi-Modal Large Language Models with Deep Learning Models for Medical Image Classification

被引:0
|
作者
Than, Joel Chia Ming [1 ]
Vong, Wan Tze [1 ]
Yong, Kelvin Sheng Chek [1 ]
机构
[1] Swinburne Univ Technol, Sch Informat Comp & Technol, Sarawak Campus, Kuching, Malaysia
关键词
LLM; multi-modal; deep learning; classification; image;
D O I
10.1109/ICSIPA62061.2024.10687159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the advancement of large language models (LLMs) such as GPT-4 and Gemini has opened new avenues for artificial intelligence applications in various domains, including medical image classification. This study aims to compare the performance of multi-modal LLMs with state-of-the-art deep learning networks in classifying tumour and non-tumour images. The performance of four multi-modal LLMs and four conventional deep learning methods were evaluated using several performance measures. The results demonstrate the strengths and limitations of both approaches, providing insights into their applicability and potential integration in clinical practice. Gemini 1.5 Pro performs the best out of the eight models evaluated. This comparison underscores the evolving role of AI in enhancing diagnostic accuracy and supporting medical professionals in disease detection especially when training data is scarce.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Exploring Fusion Strategies in Deep Learning Models for Multi-Modal Classification
    Zhang, Duoyi
    Nayak, Richi
    Bashar, Md Abul
    DATA MINING, AUSDM 2021, 2021, 1504 : 102 - 117
  • [2] Split Learning of Multi-Modal Medical Image Classification
    Ghosh, Bishwamittra
    Wang, Yuan
    Fu, Huazhu
    Wei, Qingsong
    Liu, Yong
    Goh, Rick Siow Mong
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 1326 - 1331
  • [3] Enhancing Image Classification Models with Multi-modal Biomarkers
    Caban, Jesus J.
    Liao, David
    Yao, Jianhua
    Mollura, Daniel J.
    Gochuico, Bernadette
    Yoo, Terry
    MEDICAL IMAGING 2011: COMPUTER-AIDED DIAGNOSIS, 2011, 7963
  • [4] Visual Hallucinations of Multi-modal Large Language Models
    Huang, Wen
    Liu, Hongbin
    Guo, Minxin
    Gong, Neil Zhenqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 9614 - 9631
  • [5] Generative Multi-Modal Knowledge Retrieval with Large Language Models
    Long, Xinwei
    Zeng, Jiali
    Meng, Fandong
    Ma, Zhiyuan
    Zhang, Kaiyan
    Zhou, Bowen
    Zhou, Jie
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18733 - 18741
  • [6] Distributed Training and Inference of Deep Learning Models for Multi-Modal Land Cover Classification
    Aspri, Maria
    Tsagkatakis, Grigorios
    Tsakalides, Panagiotis
    REMOTE SENSING, 2020, 12 (17)
  • [7] Incorporating Concreteness in Multi-Modal Language Models with Curriculum Learning
    Sezerer, Erhan
    Tekir, Selma
    APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [8] SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models
    Lin, Ziyi
    Liu, Dongyang
    Zhang, Renrui
    Gao, Peng
    Qiu, Longtian
    Xiao, Han
    Qiu, Han
    Shao, Wenqi
    Chen, Keqin
    Han, Jiaming
    Huang, Siyuan
    Zhang, Yichi
    He, Xuming
    Qiao, Yu
    Li, Hongsheng
    COMPUTER VISION - ECCV 2024, PT LXII, 2025, 15120 : 36 - 55
  • [9] Multi-modal large language models in radiology: principles, applications, and potential
    Shen, Yiqiu
    Xu, Yanqi
    Ma, Jiajian
    Rui, Wushuang
    Zhao, Chen
    Heacock, Laura
    Huang, Chenchan
    ABDOMINAL RADIOLOGY, 2024,
  • [10] Explainability of deep learning models in medical image classification
    Kolarik, Michal
    Sarnovsky, Martin
    Paralic, Jan
    Butka, Peter
    2022 IEEE 22ND INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS AND 8TH IEEE INTERNATIONAL CONFERENCE ON RECENT ACHIEVEMENTS IN MECHATRONICS, AUTOMATION, COMPUTER SCIENCE AND ROBOTICS (CINTI-MACRO), 2022, : 233 - 238