Comparison of Multi-Modal Large Language Models with Deep Learning Models for Medical Image Classification

被引:0
|
作者
Than, Joel Chia Ming [1 ]
Vong, Wan Tze [1 ]
Yong, Kelvin Sheng Chek [1 ]
机构
[1] Swinburne Univ Technol, Sch Informat Comp & Technol, Sarawak Campus, Kuching, Malaysia
关键词
LLM; multi-modal; deep learning; classification; image;
D O I
10.1109/ICSIPA62061.2024.10687159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the advancement of large language models (LLMs) such as GPT-4 and Gemini has opened new avenues for artificial intelligence applications in various domains, including medical image classification. This study aims to compare the performance of multi-modal LLMs with state-of-the-art deep learning networks in classifying tumour and non-tumour images. The performance of four multi-modal LLMs and four conventional deep learning methods were evaluated using several performance measures. The results demonstrate the strengths and limitations of both approaches, providing insights into their applicability and potential integration in clinical practice. Gemini 1.5 Pro performs the best out of the eight models evaluated. This comparison underscores the evolving role of AI in enhancing diagnostic accuracy and supporting medical professionals in disease detection especially when training data is scarce.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Multi-modal self-paced learning for image classification
    Xu, Wei
    Liu, Wei
    Huang, Xiaolin
    Yang, Jie
    Qiu, Song
    NEUROCOMPUTING, 2018, 309 : 134 - 144
  • [42] A Comprehensive Benchmark and Evaluation of Thai Finger Spelling in Multi-Modal Deep Learning Models
    Vijitkunsawat, Wuttichai
    Racharak, Teeradaj
    IEEE ACCESS, 2024, 12 : 158079 - 158093
  • [43] MULTI-MODAL DEEP LEARNING ON IMAGING GENETICS FOR SCHIZOPHRENIA CLASSIFICATION
    Kanyal, Ayush
    Kandula, Srinivas
    Calhoun, Vince
    Ye, Dong Hye
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [44] Multi-modal deep convolutional dictionary learning for image denoising
    Sun, Zhonggui
    Zhang, Mingzhu
    Sun, Huichao
    Li, Jie
    Liu, Tingting
    Gao, Xinbo
    NEUROCOMPUTING, 2023, 562
  • [45] Multi-modal haptic image recognition based on deep learning
    Han, Dong
    Nie, Hong
    Chen, Jinbao
    Chen, Meng
    Deng, Zhen
    Zhang, Jianwei
    SENSOR REVIEW, 2018, 38 (04) : 486 - 493
  • [46] Multi-modal hypergraph contrastive learning for medical image segmentation
    Jing, Weipeng
    Wang, Junze
    Di, Donglin
    Li, Dandan
    Song, Yang
    Fan, Lei
    PATTERN RECOGNITION, 2025, 165
  • [47] Depicting Beyond Scores: Advancing Image Quality Assessment Through Multi-modal Language Models
    You, Zhiyuan
    Li, Zheyuan
    Gu, Jinjin
    Yin, Zhenfei
    Xue, Tianfan
    Dong, Chao
    COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 259 - 276
  • [48] Fusion of Deep Learning Models for Multi-View Image Classification
    Maguire, Brian
    Seminerio, Eleanor
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXXII, 2023, 12547
  • [49] VirtuWander: Enhancing Multi-modal Interaction for Virtual Tour Guidance through Large Language Models
    Wang, Zhan
    Yuan, Lin-Ping
    Wang, Liangwei
    Jiang, Bingchuan
    Zeng, Wei
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2024, 2024,
  • [50] An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models
    Wang, Mengzhao
    Wu, Haotian
    Ke, Xiangyu
    Gao, Yunjun
    Xu, Xiaoliang
    Chen, Lu
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4333 - 4336