MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation

被引:0
|
作者
Liu, Jun [1 ]
Li, Kunqi [1 ]
Huang, Chun [1 ]
Dong, Hua [1 ]
Song, Yusheng [2 ]
Li, Rihui [3 ,4 ]
机构
[1] Nanchang Hangkong Univ, Dept Informat Engn, Nanchang 330063, Jiangxi, Peoples R China
[2] Peoples Hosp Ganzhou, Dept Intervent Radiol, Ganzhou 341000, Jiangxi, Peoples R China
[3] Univ Macau, Inst Collaborat Innovat, Ctr Cognit & Brain Sci, Macau, Peoples R China
[4] Univ Macau, Fac Sci & Technol, Dept Elect & Comp Engn, Macau, Peoples R China
基金
中国国家自然科学基金;
关键词
Image segmentation; Transformers; Feature extraction; Semantics; Decoding; Computational modeling; Medical diagnostic imaging; Computer architecture; Computer vision; Convolutional neural networks; Medical image segmentation (SEG); mixed convolutional neural network (CNN)-Transformer backbone; mixed multibranch dilated attention (MMDA); multiscale spatial-aware fusion (MSAF); ATTENTION;
D O I
10.1109/TIM.2024.3497060
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Transformers using self-attention mechanisms have recently advanced medical imaging by modeling long-range semantic dependencies, though they lack the ability of convolutional neural networks (CNNs) to capture local spatial details. This study introduced a novel segmentation (SEG) network derived from a mixed CNN-Transformer (MixFormer) feature extraction backbone to enhance medical image segmentation. The MixFormer network seamlessly integrates global and local information from Transformer and CNN architectures during the downsampling process. To comprehensively capture the interscale perspective, we introduced a multiscale spatial-aware fusion (MSAF) module, enabling effective interaction between coarse and fine feature representations. In addition, we proposed a mixed multibranch dilated attention (MMDA) module to bridge the semantic gap between encoding and decoding stages while emphasizing specific regions. Finally, we implemented a CNN-based upsampling approach to recover low-level features, substantially improving segmentation accuracy. Experimental validations on prevalent medical image datasets demonstrated the superior performance of MixFormer. On the Synapse dataset, our approach achieved a mean Dice similarity coefficient (DSC) of 82.64% and a mean Hausdorff distance (HD) of 12.67 mm. On the automated cardiac diagnosis challenge (ACDC) dataset, the DSC was 91.01%. On the international skin imaging collaboration (ISIC) 2018 dataset, the model achieved a mean intersection over union (mIoU) of 0.841, an accuracy of 0.958, a precision of 0.910, a recall of 0.934, and an F1 score of 0.913. For the Kvasir-SEG dataset, we recorded a mean Dice of 0.9247, an mIoU of 0.8615, a precision of 0.9181, and a recall of 0.9463. On the computer vision center (CVC)-ClinicDB dataset, the results were a mean Dice of 0.9441, an mIoU of 0.8922, a precision of 0.9437, and a recall of 0.9458. These findings underscore the superior segmentation performance of MixFormer compared to most mainstream segmentation networks such as CNNs and other Transformer-based structures.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] TFCNs: A CNN-Transformer Hybrid Network for Medical Image Segmentation
    Li, Zihan
    Li, Dihan
    Xu, Cangbai
    Wang, Weice
    Hong, Qingqi
    Li, Qingde
    Tian, Jie
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 781 - 792
  • [2] HTC-Net: A hybrid CNN-transformer framework for medical image segmentation
    Tang, Hui
    Chen, Yuanbin
    Wang, Tao
    Zhou, Yuanbo
    Zhao, Longxuan
    Gao, Qinquan
    Du, Min
    Tan, Tao
    Zhang, Xinlin
    Tong, Tong
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 88
  • [3] Alternate encoder and dual decoder CNN-Transformer networks for medical image segmentation
    Zhang, Lin
    Guo, Xinyu
    Sun, Hongkun
    Wang, Weigang
    Yao, Liwei
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [4] Multi-Scale Orthogonal Model CNN-Transformer for Medical Image Segmentation
    Zhou, Wuyi
    Zeng, Xianhua
    Zhou, Mingkun
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (10)
  • [5] UCTNet: Uncertainty-guided CNN-Transformer hybrid networks for medical image segmentation
    Guo, Xiayu
    Lin, Xian
    Yang, Xin
    Yu, Li
    Cheng, Kwang-Ting
    Yan, Zengqiang
    PATTERN RECOGNITION, 2024, 152
  • [6] RAMIS: Increasing robustness and accuracy in medical image segmentation with hybrid CNN-transformer synergy
    Gu, Jia
    Tian, Fangzheng
    Oh, Il-Seok
    NEUROCOMPUTING, 2025, 618
  • [7] LATrans-Unet: Improving CNN-Transformer with Location Adaptive for Medical Image Segmentation
    Lin, Qiqin
    Yao, Junfeng
    Hong, Qingqi
    Cao, Xianpeng
    Zhou, Rongzhou
    Xie, Weixing
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XIII, 2024, 14437 : 223 - 234
  • [8] CNN-Transformer Hybrid Architecture for Underwater Sonar Image Segmentation
    Lei, Juan
    Wang, Huigang
    Lei, Zelin
    Li, Jiayuan
    Rong, Shaowei
    REMOTE SENSING, 2025, 17 (04)
  • [9] ENHANCING HYBRID CNN-TRANSFORMER VIA FREQUENCY-BASED BRIDGING FOR MEDICAL IMAGE SEGMENTATION
    Zeng Xinyi
    Tang Cheng
    Zeng Pinxian
    Cui Jiaqi
    Yan Binyu
    Wang Peng
    Wang Yan
    IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI 2024, 2024,
  • [10] HCTNet: A hybrid CNN-transformer network for breast ultrasound image segmentation
    He, Qiqi
    Yang, Qiuju
    Xie, Minghao
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 155