Comparison of Multi-Modal Large Language Models with Deep Learning Models for Medical Image Classification

被引：0

作者：

Than, Joel Chia Ming ^{[1
]}

Vong, Wan Tze ^{[1
]}

Yong, Kelvin Sheng Chek ^{[1
]}

机构：

[1] Swinburne Univ Technol, Sch Informat Comp & Technol, Sarawak Campus, Kuching, Malaysia

来源：

2024 IEEE 8TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING APPLICATIONS, ICSIPA | 2024年

关键词：

LLM; multi-modal; deep learning; classification; image;

D O I：

10.1109/ICSIPA62061.2024.10687159

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the advancement of large language models (LLMs) such as GPT-4 and Gemini has opened new avenues for artificial intelligence applications in various domains, including medical image classification. This study aims to compare the performance of multi-modal LLMs with state-of-the-art deep learning networks in classifying tumour and non-tumour images. The performance of four multi-modal LLMs and four conventional deep learning methods were evaluated using several performance measures. The results demonstrate the strengths and limitations of both approaches, providing insights into their applicability and potential integration in clinical practice. Gemini 1.5 Pro performs the best out of the eight models evaluated. This comparison underscores the evolving role of AI in enhancing diagnostic accuracy and supporting medical professionals in disease detection especially when training data is scarce.

引用

页数：5

共 50 条

[31] Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
He, Liqi
Li, Zuchao
Cai, Xiantao
Wang, Ping
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18180 - 18187
[32] Fine-grained multi-modal prompt learning for vision-language models
Liu, Yunfei
Deng, Yunziwei
Liu, Anqi
Liu, Yanan
Li, Shengyang
NEUROCOMPUTING, 2025, 636
[33] Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models
Long, Sifan
Zhao, Zhen
Yuan, Junkun
Tan, Zichang
Liu, Jiangjiang
Zhou, Luping
Wang, Shengsheng
Wang, Jingdong
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21902 - 21912
[34] Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Li, Zhang
Yang, Biao
Liu, Qiang
Ma, Zhiyin
Zhang, Shuo
Yang, Jingxu
Sung, Yabo
Liu, Yuliang
Bai, Xiang
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26753 - 26763
[35] Multi-modal Adapter for Medical Vision-and-Language Learning
Yu, Zheng
Qiao, Yanyuan
Xie, Yutong
Wu, Qi
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 393 - 402
[36] Multi-Modal Attribute Prompting for Vision-Language Models
Liu, Xin
Wu, Jiamin
Yang, Wenfei
Zhou, Xu
Zhang, Tianzhu
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11579 - 11591
[37] Demonstrating CAESURA: Language Models as Multi-Modal Query Planners
Urban, Matthias
Binnig, Carsten
COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 472 - 475
[38] Multi-modal Language Models for Human-Robot Interaction
Janssens, Ruben
COMPANION OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024 COMPANION, 2024, : 109 - 111
[39] MMA: Multi-Modal Adapter for Vision-Language Models
Yang, Lingxiao
Zhang, Ru-Yuan
Wang, Yanchen
Xie, Xiaohua
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23826 - +
[40] Gait Image Classification Using Deep Learning Models for Medical Diagnosis
Vasudevan, Pavitra
Mattins, R. Faerie
Srivarshan, S.
Narayanan, Ashvath
Wadhwani, Gayatri
Parvathi, R.
Maheswari, R.
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 6039 - 6063

← 1 2 3 4 5 →