Comparison of Multi-Modal Large Language Models with Deep Learning Models for Medical Image Classification

被引：0

作者：

Than, Joel Chia Ming ^{[1
]}

Vong, Wan Tze ^{[1
]}

Yong, Kelvin Sheng Chek ^{[1
]}

机构：

[1] Swinburne Univ Technol, Sch Informat Comp & Technol, Sarawak Campus, Kuching, Malaysia

来源：

2024 IEEE 8TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING APPLICATIONS, ICSIPA | 2024年

关键词：

LLM; multi-modal; deep learning; classification; image;

D O I：

10.1109/ICSIPA62061.2024.10687159

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the advancement of large language models (LLMs) such as GPT-4 and Gemini has opened new avenues for artificial intelligence applications in various domains, including medical image classification. This study aims to compare the performance of multi-modal LLMs with state-of-the-art deep learning networks in classifying tumour and non-tumour images. The performance of four multi-modal LLMs and four conventional deep learning methods were evaluated using several performance measures. The results demonstrate the strengths and limitations of both approaches, providing insights into their applicability and potential integration in clinical practice. Gemini 1.5 Pro performs the best out of the eight models evaluated. This comparison underscores the evolving role of AI in enhancing diagnostic accuracy and supporting medical professionals in disease detection especially when training data is scarce.

引用

页数：5

共 50 条

[41] Multi-modal self-paced learning for image classification
Xu, Wei
Liu, Wei
Huang, Xiaolin
Yang, Jie
Qiu, Song
NEUROCOMPUTING, 2018, 309 : 134 - 144
[42] A Comprehensive Benchmark and Evaluation of Thai Finger Spelling in Multi-Modal Deep Learning Models
Vijitkunsawat, Wuttichai
Racharak, Teeradaj
IEEE ACCESS, 2024, 12 : 158079 - 158093
[43] MULTI-MODAL DEEP LEARNING ON IMAGING GENETICS FOR SCHIZOPHRENIA CLASSIFICATION
Kanyal, Ayush
Kandula, Srinivas
Calhoun, Vince
Ye, Dong Hye
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
[44] Multi-modal deep convolutional dictionary learning for image denoising
Sun, Zhonggui
Zhang, Mingzhu
Sun, Huichao
Li, Jie
Liu, Tingting
Gao, Xinbo
NEUROCOMPUTING, 2023, 562
[45] Multi-modal haptic image recognition based on deep learning
Han, Dong
Nie, Hong
Chen, Jinbao
Chen, Meng
Deng, Zhen
Zhang, Jianwei
SENSOR REVIEW, 2018, 38 (04) : 486 - 493
[46] Multi-modal hypergraph contrastive learning for medical image segmentation
Jing, Weipeng
Wang, Junze
Di, Donglin
Li, Dandan
Song, Yang
Fan, Lei
PATTERN RECOGNITION, 2025, 165
[47] Depicting Beyond Scores: Advancing Image Quality Assessment Through Multi-modal Language Models
You, Zhiyuan
Li, Zheyuan
Gu, Jinjin
Yin, Zhenfei
Xue, Tianfan
Dong, Chao
COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 259 - 276
[48] Fusion of Deep Learning Models for Multi-View Image Classification
Maguire, Brian
Seminerio, Eleanor
SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXXII, 2023, 12547
[49] VirtuWander: Enhancing Multi-modal Interaction for Virtual Tour Guidance through Large Language Models
Wang, Zhan
Yuan, Lin-Ping
Wang, Liangwei
Jiang, Bingchuan
Zeng, Wei
PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2024, 2024,
[50] An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models
Wang, Mengzhao
Wu, Haotian
Ke, Xiangyu
Gao, Yunjun
Xu, Xiaoliang
Chen, Lu
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4333 - 4336

← 1 2 3 4 5 →