Comparison of Multi-Modal Large Language Models with Deep Learning Models for Medical Image Classification

被引:0
|
作者
Than, Joel Chia Ming [1 ]
Vong, Wan Tze [1 ]
Yong, Kelvin Sheng Chek [1 ]
机构
[1] Swinburne Univ Technol, Sch Informat Comp & Technol, Sarawak Campus, Kuching, Malaysia
关键词
LLM; multi-modal; deep learning; classification; image;
D O I
10.1109/ICSIPA62061.2024.10687159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the advancement of large language models (LLMs) such as GPT-4 and Gemini has opened new avenues for artificial intelligence applications in various domains, including medical image classification. This study aims to compare the performance of multi-modal LLMs with state-of-the-art deep learning networks in classifying tumour and non-tumour images. The performance of four multi-modal LLMs and four conventional deep learning methods were evaluated using several performance measures. The results demonstrate the strengths and limitations of both approaches, providing insights into their applicability and potential integration in clinical practice. Gemini 1.5 Pro performs the best out of the eight models evaluated. This comparison underscores the evolving role of AI in enhancing diagnostic accuracy and supporting medical professionals in disease detection especially when training data is scarce.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Classifying Excavator Operations with Fusion Network of Multi-modal Deep Learning Models
    Kim, Jin-Young
    Cho, Sung-Bae
    14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019), 2020, 950 : 25 - 34
  • [22] The role of mental models in a multi-modal image search
    Frost, C
    ASIST 2001: PROCEEDINGS OF THE 64TH ASIST ANNUAL MEETING, VOL 38, 2001, 2001, 38 : 52 - 57
  • [23] An explainable deep learning pipeline for multi-modal multi-organ medical image segmentation
    Mylona, E.
    Zaridis, D.
    Grigoriadis, G.
    Tachos, N.
    Fotiadis, D. I.
    RADIOTHERAPY AND ONCOLOGY, 2022, 170 : S275 - S276
  • [24] Combining Deep Learning with Signal-image Encoding for Multi-Modal MentalWellbeing Classification
    Woodward K.
    Kanjo E.
    Tsanas A.
    ACM Transactions on Computing for Healthcare, 2024, 5 (01):
  • [25] Multi-modal Broad Learning System for Medical Image and Text-based Classification
    Zhou, Yanhong
    Du, Jie
    Guan, Kai
    Wang, Tianfu
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 3439 - 3442
  • [26] ACIGS: An automated large-scale crops image generation system based on large visual language multi-modal models
    Liu, Bolong
    Zhang, Hao
    Liu, Jie
    Wang, Qiang
    2023 20TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON SENSING, COMMUNICATION, AND NETWORKING, SECON, 2023,
  • [27] Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
    Niu, Yulei
    Lu, Zhiwu
    Wen, Ji-Rong
    Xiang, Tao
    Chang, Shih-Fu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 1720 - 1731
  • [28] TransMed: Transformers Advance Multi-Modal Medical Image Classification
    Dai, Yin
    Gao, Yifan
    Liu, Fayu
    DIAGNOSTICS, 2021, 11 (08)
  • [29] VGV: Verilog Generation using Visual Capabilities of Multi-Modal Large Language Models
    Wong, Sam-Zaak
    Wan, Gwok-Waa
    Liu, Dongping
    Wang, Xi
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
  • [30] Comparison of Deep Learning Image-to-image Models for Medical Image Translation
    Yang, Zeyu
    Zoellner, Frank G.
    BILDVERARBEITUNG FUR DIE MEDIZIN 2024, 2024, : 344 - 349