Classification of Mobile-Based Oral Cancer Images Using the Vision Transformer and the Swin Transformer

被引:8
|
作者
Song, Bofan [1 ]
Raj, Dharma K. C. [2 ]
Yang, Rubin Yuchan [2 ]
Li, Shaobai [1 ]
Zhang, Chicheng [2 ]
Liang, Rongguang [1 ]
机构
[1] Univ Arizona, Wyant Coll Opt Sci, Tucson, AZ 85721 USA
[2] Univ Arizona, Comp Sci Dept, Tucson, AZ 85721 USA
关键词
Vision Transformer; Swin Transformer; oral cancer; oral image analysis; artificial intelligence;
D O I
10.3390/cancers16050987
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Simple Summary Transformer models, originally successful in natural language processing, have found application in computer vision, demonstrating promising results in tasks related to cancer image analysis. Despite being one of the prevalent and swiftly spreading cancers globally, there is a pressing need for accurate automated analysis methods for oral cancer. This need is particularly critical for high-risk populations residing in low- and middle-income countries. In this study, we evaluated the performance of the Vision Transformer (ViT) and the Swin Transformer in the classification of mobile-based oral cancer images we collected from high-risk populations. The results showed that the Swin Transformer model achieved higher accuracy than the ViT model, and both transformer models work better than the conventional convolution model VGG19.Abstract Oral cancer, a pervasive and rapidly growing malignant disease, poses a significant global health concern. Early and accurate diagnosis is pivotal for improving patient outcomes. Automatic diagnosis methods based on artificial intelligence have shown promising results in the oral cancer field, but the accuracy still needs to be improved for realistic diagnostic scenarios. Vision Transformers (ViT) have outperformed learning CNN models recently in many computer vision benchmark tasks. This study explores the effectiveness of the Vision Transformer and the Swin Transformer, two cutting-edge variants of the transformer architecture, for the mobile-based oral cancer image classification application. The pre-trained Swin transformer model achieved 88.7% accuracy in the binary classification task, outperforming the ViT model by 2.3%, while the conventional convolutional network model VGG19 and ResNet50 achieved 85.2% and 84.5% accuracy. Our experiments demonstrate that these transformer-based architectures outperform traditional convolutional neural networks in terms of oral cancer image classification, and underscore the potential of the ViT and the Swin Transformer in advancing the state of the art in oral cancer image analysis.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
    Liu, Ze
    Lin, Yutong
    Cao, Yue
    Hu, Han
    Wei, Yixuan
    Zhang, Zheng
    Lin, Stephen
    Guo, Baining
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9992 - 10002
  • [2] Vision Transformer-based Classification for Lung and Colon Cancer using Histopathology Images
    Hasan, Munjur
    Rahman, Md Saifur
    Islam, Sabrina
    Ahmed, Tanvir
    Rifat, Nafiz
    Ahsan, Mostofa
    Gomes, Rahul
    Chowdhury, Md.
    Proceedings - 22nd IEEE International Conference on Machine Learning and Applications, ICMLA 2023, 2023, : 1300 - 1304
  • [3] Improving Classification of Remotely Sensed Images with the Swin Transformer
    Jannat, Fatema-E
    Willis, Andrew R.
    SOUTHEASTCON 2022, 2022, : 611 - 618
  • [4] Supremacy of attention-based transformer in oral cancer classification using histopathology images
    Deo, Bhaswati Singha
    Pal, Mayukha
    Panigrahi, Prasanta K.
    Pradhan, Asima
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [5] A vision transformer for emphysema classification using CT images
    Wu, Yanan
    Qi, Shouliang
    Sun, Yu
    Xia, Shuyue
    Yao, Yudong
    Qian, Wei
    PHYSICS IN MEDICINE AND BIOLOGY, 2021, 66 (24):
  • [6] Image Classification Using Vision Transformer for EtC Images
    Hamano, Genki
    Imaizumi, Shoko
    Kiya, Hitoshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1506 - 1513
  • [7] Multiclass skin lesion classification in dermoscopic images using swin transformer model
    Ayas, Selen
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (09): : 6713 - 6722
  • [8] Multiclass skin lesion classification in dermoscopic images using swin transformer model
    Selen Ayas
    Neural Computing and Applications, 2023, 35 : 6713 - 6722
  • [9] BREASTUS: VISION TRANSFORMER FOR BREAST CANCER CLASSIFICATION USING BREAST ULTRASOUND IMAGES
    Saad, Muhammad
    Ullah, Mohib
    Afridi, Hina
    Cheikh, Faouzi Alaya
    Sajjad, Muhammad
    2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 246 - 253
  • [10] Vision Transformer Based Tokenization for Enhanced Breast Cancer Histopathological Images Classification
    Abimouloud, Mouhamed Laid
    Bensid, Khaled
    Elleuch, Mohamed
    Aiadi, Oussama
    Kherallah, Monji
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT I, AIAI 2024, 2024, 711 : 255 - 267