Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution

被引:25
|
作者
Cai, Yimin [1 ]
Long, Yuqing [2 ]
Han, Zhenggong [3 ]
Liu, Mingkun [1 ]
Zheng, Yuchen [1 ]
Yang, Wei [1 ]
Chen, Liming [4 ]
机构
[1] Guizhou Univ, Sch Med, Guiyang, Peoples R China
[2] ZunYi Med Univ, Sch Stomatolog, Zunyi, Peoples R China
[3] Guizhou Univ, Key Lab Adv Mfg Technol, Minist Educ, Guiyang, Peoples R China
[4] Guizhou Univ, Dent Hosp Guizhou Univ, Guiyang Dent Hosp, Guiyang, Peoples R China
关键词
Deep learning; Medical image segmentation; 3D Swin Transformer; Brain tumor;
D O I
10.1186/s12911-023-02129-z
中图分类号
R-058 [];
学科分类号
摘要
Background Semantic segmentation of brain tumors plays a critical role in clinical treatment, especially for threedimensional (3D) magnetic resonance imaging, which is often used in clinical practice. Automatic segmentation of the 3D structure of brain tumors can quickly help physicians understand the properties of tumors, such as the shape and size, thus improving the efficiency of preoperative planning and the odds of successful surgery. In past decades, 3D convolutional neural networks (CNNs) have dominated automatic segmentation methods for 3D medical images, and these network structures have achieved good results. However, to reduce the number of neural network parameters, practitioners ensure that the size of convolutional kernels in 3D convolutional operations generally does not exceed 7 x 7 x 7 , which also leads to CNNs showing limitations in learning long-distance dependent information. Vision Transformer (ViT) is very good at learning long-distance dependent information in images, but it suffers from the problems of many parameters. What's worse, the ViT cannot learn local dependency information in the previous layers under the condition of insufficient data. However, in the image segmentation task, being able to learn this local dependency information in the previous layers makes a big impact on the performance of the model. Methods This paper proposes the Swin Unet3D model, which represents voxel segmentation on medical images as a sequence-to-sequence prediction. The feature extraction sub-module in the model is designed as a parallel structure of Convolution and ViT so that all layers of the model are able to adequately learn both global and local dependency information in the image.Results On the validation dataset of Brats2021, our proposed model achieves dice coefficients of 0.840, 0.874, and 0.911 on the ET channel, TC channel, and WT channel, respectively. On the validation dataset of Brats2018, our model achieves dice coefficients of 0.716, 0.761, and 0.874 on the corresponding channels, respectively.Conclusion We propose a new segmentation model that combines the advantages of Vision Transformer and Convolution and achieves a better balance between the number of model parameters and segmentation accuracy. The code can be found at https://github.com/1152545264/SwinUnet3D.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Swin transformer with multiscale 3D atrous convolution for hyperspectral image classification
    Farooque, Ghulam
    Liu, Qichao
    Sargano, Allah Bux
    Xiao, Liang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [22] TransGraphNet: A novel network for medical image segmentation based on transformer and graph convolution
    Zhang, Ju
    Ye, Zhiyi
    Chen, Mingyang
    Yu, Jiahao
    Cheng, Yun
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 104
  • [23] STM-UNet: An Efficient U-shaped Architecture Based on Swin Transformer and Multiscale MLP for Medical Image Segmentation
    Shi, Lei
    Gao, Tianyu
    Zhang, Zheng
    Zhang, Junxing
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 2003 - 2008
  • [24] TAC-UNet: transformer-assisted convolutional neural network for medical image segmentation
    He, Jingliu
    Ma, Yuqi
    Yang, Mingyue
    Yang, Wensong
    Wu, Chunming
    Chen, Shanxiong
    QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2024, 14 (12) : 8824 - 8839
  • [25] HCA-former: Hybrid Convolution Attention Transformer for 3D Medical Image Segmentation
    Yang, Fan
    Wang, Fan
    Dong, Pengwei
    Wang, Bo
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 90
  • [26] DECTNet: Dual Encoder Network combined convolution and Transformer architecture for medical image segmentation
    Li, Boliang
    Xu, Yaming
    Wang, Yan
    Zhang, Bo
    PLOS ONE, 2024, 19 (04):
  • [27] CTRANS: A MULTI-RESOLUTION CONVOLUTION-TRANSFORMER NETWORK FOR MEDICAL IMAGE SEGMENTATION
    Gong, Zhendi
    French, Andrew P.
    Qiu, Guoping
    Chen, Xin
    IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI 2024, 2024,
  • [28] DSML-UNet: Depthwise separable convolution network with multiscale large kernel for medical image segmentation
    Wang, Biao
    Qin, Juan
    Lv, Lianrong
    Cheng, Mengdan
    Li, Lei
    He, Junjie
    Li, Dingyao
    Xia, Dan
    Wang, Meng
    Ren, Haiping
    Wang, Shike
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 97
  • [29] A Transformer-Based Network for Anisotropic 3D Medical Image Segmentation
    Guo, Danfeng
    Terzopoulos, Demetri
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8857 - 8861
  • [30] 3V3D: Three-View Contextual Cross-slice Difference Three-dimensional Medical Image Segmentation Adversarial Network
    Zeng, Xianhua
    Chen, Saiyuan
    Xie, Yicai
    Liao, Tianxing
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)