Cervical OCT image classification using contrastive masked autoencoders with Swin Transformer

被引:0
|
作者
Wang, Qingbin [1 ]
Xiong, Yuxuan [1 ]
Zhu, Hanfeng [2 ,3 ]
Mu, Xuefeng [4 ]
Zhang, Yan [4 ]
Ma, Yutao [2 ,3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Peoples R China
[3] Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Smart, Wuhan 430079, Peoples R China
[4] Wuhan Univ, Remin Hosp, Dept Obstet & Gynecol, Wuhan 430060, Peoples R China
关键词
Cervical cancer; Optical coherence tomography; Image classification; Self-supervised learning; Swin Transformer; Interpretability; OPTICAL COHERENCE TOMOGRAPHY;
D O I
10.1016/j.compmedimag.2024.102469
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Background and Objective: Cervical cancer poses a major health threat to women globally. Optical coherence tomography (OCT) imaging has recently shown promise for non-invasive cervical lesion diagnosis. However, obtaining high-quality labeled cervical OCT images is challenging and time-consuming as they must correspond precisely with pathological results. The scarcity of such high-quality labeled data hinders the application of supervised deep-learning models in practical clinical settings. This study addresses the above challenge by proposing CMSwin, a novel self-supervised learning (SSL) framework combining masked image modeling (MIM) with contrastive learning based on the Swin-Transformer architecture to utilize abundant unlabeled cervical OCT images. Methods: In this contrastive-MIM framework, mixed image encoding is combined with a latent contextual regressor to solve the inconsistency problem between pre-training and fine-tuning and separate the encoder's feature extraction task from the decoder's reconstruction task, allowing the encoder to extract better image representations. Besides, contrastive losses at the patch and image levels are elaborately designed to leverage massive unlabeled data. Results: We validated the superiority of CMSwin over the state-of-the-art SSL approaches with five-fold cross- validation on an OCT image dataset containing 1,452 patients from a multi-center clinical study in China, plus two external validation sets from top-ranked Chinese hospitals: the Huaxi dataset from the West China Hospital of Sichuan University and the Xiangya dataset from the Xiangya Second Hospital of Central South University. A human-machine comparison experiment on the Huaxi and Xiangya datasets for volume-level binary classification also indicates that CMSwin can match or exceed the average level of four skilled medical experts, especially in identifying high-risk cervical lesions. Conclusion: Our work has great potential to assist gynecologists in intelligently interpreting cervical OCT images in clinical settings. Additionally, the integrated GradCAM module of CMSwin enables cervical lesion visualization and interpretation, providing good interpretability for gynecologists to diagnose cervical diseases efficiently.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Transformer-Based Masked Autoencoder With Contrastive Loss for Hyperspectral Image Classification
    Cao, Xianghai
    Lin, Haifeng
    Guo, Shuaixu
    Xiong, Tao
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [2] Contrastive Transformer Masked Image Hashing for Degraded Image Retrieval
    Shen, Xiaobo
    Cai, Haoyu
    Gong, Xiuwen
    Zheng, Yuhui
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1218 - 1226
  • [3] Spectral-Spatial Masked Transformer With Supervised and Contrastive Learning for Hyperspectral Image Classification
    Huang, Lingbo
    Chen, Yushi
    He, Xin
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [4] Swin-RSIC: remote sensing image classification using a modified swin transformer with explainability
    Ansith S
    Ananth A
    Ebin Deni Raj
    Kala S
    Earth Science Informatics, 2025, 18 (2)
  • [5] MalSort: Lightweight and efficient image-based malware classification using masked self-supervised framework with Swin Transformer
    Wang, Fangwei
    Shi, Xipeng
    Yang, Fang
    Song, Ruixin
    Li, Qingru
    Tan, Zhiyuan
    Wang, Changguang
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2024, 83
  • [6] Spectral Swin Transformer Network for Hyperspectral Image Classification
    Liu, Baisen
    Liu, Yuanjia
    Zhang, Wulin
    Tian, Yiran
    Kong, Weili
    REMOTE SENSING, 2023, 15 (15)
  • [7] SPT-Swin: A Shifted Patch Tokenization Swin Transformer for Image Classification
    Ferdous, Gazi Jannatul
    Sathi, Khaleda Akhter
    Hossain, Md. Azad
    Dewan, M. Ali Akber
    IEEE ACCESS, 2024, 12 : 117617 - 117626
  • [8] A 3-D-Swin Transformer-Based Hierarchical Contrastive Learning Method for Hyperspectral Image Classification
    Huang, Xin
    Dong, Mengjie
    Li, Jiayi
    Guo, Xian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [9] Adaptive Masked Autoencoder Transformer for image classification
    Chen, Xiangru
    Liu, Chenjing
    Hu, Peng
    Lin, Jie
    Gong, Yunhong
    Chen, Yingke
    Peng, Dezhong
    Geng, Xue
    APPLIED SOFT COMPUTING, 2024, 164
  • [10] Vision transformer with masked autoencoders for referable diabetic retinopathy classification based on large-size retina image
    Yang, Yaoming
    Cai, Zhili
    Qiu, Shuxia
    Xu, Peng
    PLOS ONE, 2024, 19 (03):