Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution

被引:25
|
作者
Cai, Yimin [1 ]
Long, Yuqing [2 ]
Han, Zhenggong [3 ]
Liu, Mingkun [1 ]
Zheng, Yuchen [1 ]
Yang, Wei [1 ]
Chen, Liming [4 ]
机构
[1] Guizhou Univ, Sch Med, Guiyang, Peoples R China
[2] ZunYi Med Univ, Sch Stomatolog, Zunyi, Peoples R China
[3] Guizhou Univ, Key Lab Adv Mfg Technol, Minist Educ, Guiyang, Peoples R China
[4] Guizhou Univ, Dent Hosp Guizhou Univ, Guiyang Dent Hosp, Guiyang, Peoples R China
关键词
Deep learning; Medical image segmentation; 3D Swin Transformer; Brain tumor;
D O I
10.1186/s12911-023-02129-z
中图分类号
R-058 [];
学科分类号
摘要
Background Semantic segmentation of brain tumors plays a critical role in clinical treatment, especially for threedimensional (3D) magnetic resonance imaging, which is often used in clinical practice. Automatic segmentation of the 3D structure of brain tumors can quickly help physicians understand the properties of tumors, such as the shape and size, thus improving the efficiency of preoperative planning and the odds of successful surgery. In past decades, 3D convolutional neural networks (CNNs) have dominated automatic segmentation methods for 3D medical images, and these network structures have achieved good results. However, to reduce the number of neural network parameters, practitioners ensure that the size of convolutional kernels in 3D convolutional operations generally does not exceed 7 x 7 x 7 , which also leads to CNNs showing limitations in learning long-distance dependent information. Vision Transformer (ViT) is very good at learning long-distance dependent information in images, but it suffers from the problems of many parameters. What's worse, the ViT cannot learn local dependency information in the previous layers under the condition of insufficient data. However, in the image segmentation task, being able to learn this local dependency information in the previous layers makes a big impact on the performance of the model. Methods This paper proposes the Swin Unet3D model, which represents voxel segmentation on medical images as a sequence-to-sequence prediction. The feature extraction sub-module in the model is designed as a parallel structure of Convolution and ViT so that all layers of the model are able to adequately learn both global and local dependency information in the image.Results On the validation dataset of Brats2021, our proposed model achieves dice coefficients of 0.840, 0.874, and 0.911 on the ET channel, TC channel, and WT channel, respectively. On the validation dataset of Brats2018, our model achieves dice coefficients of 0.716, 0.761, and 0.874 on the corresponding channels, respectively.Conclusion We propose a new segmentation model that combines the advantages of Vision Transformer and Convolution and achieves a better balance between the number of model parameters and segmentation accuracy. The code can be found at https://github.com/1152545264/SwinUnet3D.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Shape-based interactive three-dimensional medical image segmentation
    Hinshaw, KP
    Brinkley, JF
    IMAGE PROCESSING - MEDICAL IMAGING 1997, PTS 1 AND 2, 1997, 3034 : 236 - 242
  • [42] MS-UNet: Swin Transformer U-Net with Multi-scale Nested Decoder for Medical Image Segmentation with Small Training Data
    Chen, Haoyuan
    Han, Yufei
    Li, Yanyi
    Xu, Pin
    Li, Kuan
    Yin, Jianping
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XIII, 2024, 14437 : 472 - 483
  • [43] SPCTNet: A Series-Parallel CNN and Transformer Network for 3D Medical Image Segmentation
    Yu, Bin
    Zhou, Quan
    Zhang, Xuming
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 376 - 387
  • [44] A New Image Segmentation Method based on Three-Dimensional Neural Network
    Lin, Guofu
    MECHATRONICS AND INTELLIGENT MATERIALS II, PTS 1-6, 2012, 490-495 : 157 - 161
  • [45] Automatic augmentation and segmentation system for three-dimensional point cloud of pavement potholes by fusion convolution and transformer
    Dong, Jiaxiu
    Wang, Niannian
    Fang, Hongyuan
    Lu, Hongfang
    Ma, Duo
    Hu, Haobang
    ADVANCED ENGINEERING INFORMATICS, 2024, 60
  • [46] Swin SMT: Global Sequential Modeling for Enhancing 3D Medical Image Segmentation
    Plotka, Szymon
    Chrabaszcz, Maciej
    Biecek, Przemyslaw
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VIII, 2024, 15008 : 689 - 698
  • [47] SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation
    Perera, Shehan
    Navard, Pouyan
    Yilmaz, Alper
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 4981 - 4988
  • [48] STU3: Multi-organ CT Medical Image Segmentation Model Based on Transformer and UNet
    Zheng, Wenjin
    Li, Bo
    Chen, Wanyi
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 170 - 181
  • [49] Three-dimensional virtual try-on network based on attention mechanism and vision transformer
    Yuan T.
    Wang X.
    Luo W.
    Mei C.
    Wei J.
    Zhong Y.
    Fangzhi Xuebao/Journal of Textile Research, 2023, 44 (07): : 192 - 198
  • [50] Anisotropic Adapted Meshes for Image Segmentation: Application to Three-Dimensional Medical Data
    Clerici, Francesco
    Ferro, Nicola
    Marconi, Stefania
    Micheletti, Stefano
    Negrello, Erika
    Perotto, Simona
    SIAM JOURNAL ON IMAGING SCIENCES, 2020, 13 (04): : 2189 - 2212