Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution

被引:25
|
作者
Cai, Yimin [1 ]
Long, Yuqing [2 ]
Han, Zhenggong [3 ]
Liu, Mingkun [1 ]
Zheng, Yuchen [1 ]
Yang, Wei [1 ]
Chen, Liming [4 ]
机构
[1] Guizhou Univ, Sch Med, Guiyang, Peoples R China
[2] ZunYi Med Univ, Sch Stomatolog, Zunyi, Peoples R China
[3] Guizhou Univ, Key Lab Adv Mfg Technol, Minist Educ, Guiyang, Peoples R China
[4] Guizhou Univ, Dent Hosp Guizhou Univ, Guiyang Dent Hosp, Guiyang, Peoples R China
关键词
Deep learning; Medical image segmentation; 3D Swin Transformer; Brain tumor;
D O I
10.1186/s12911-023-02129-z
中图分类号
R-058 [];
学科分类号
摘要
Background Semantic segmentation of brain tumors plays a critical role in clinical treatment, especially for threedimensional (3D) magnetic resonance imaging, which is often used in clinical practice. Automatic segmentation of the 3D structure of brain tumors can quickly help physicians understand the properties of tumors, such as the shape and size, thus improving the efficiency of preoperative planning and the odds of successful surgery. In past decades, 3D convolutional neural networks (CNNs) have dominated automatic segmentation methods for 3D medical images, and these network structures have achieved good results. However, to reduce the number of neural network parameters, practitioners ensure that the size of convolutional kernels in 3D convolutional operations generally does not exceed 7 x 7 x 7 , which also leads to CNNs showing limitations in learning long-distance dependent information. Vision Transformer (ViT) is very good at learning long-distance dependent information in images, but it suffers from the problems of many parameters. What's worse, the ViT cannot learn local dependency information in the previous layers under the condition of insufficient data. However, in the image segmentation task, being able to learn this local dependency information in the previous layers makes a big impact on the performance of the model. Methods This paper proposes the Swin Unet3D model, which represents voxel segmentation on medical images as a sequence-to-sequence prediction. The feature extraction sub-module in the model is designed as a parallel structure of Convolution and ViT so that all layers of the model are able to adequately learn both global and local dependency information in the image.Results On the validation dataset of Brats2021, our proposed model achieves dice coefficients of 0.840, 0.874, and 0.911 on the ET channel, TC channel, and WT channel, respectively. On the validation dataset of Brats2018, our model achieves dice coefficients of 0.716, 0.761, and 0.874 on the corresponding channels, respectively.Conclusion We propose a new segmentation model that combines the advantages of Vision Transformer and Convolution and achieves a better balance between the number of model parameters and segmentation accuracy. The code can be found at https://github.com/1152545264/SwinUnet3D.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A Medical Image Segmentation Method Based on Improved UNet 3+ Network
    Xu, Yang
    Hou, Shike
    Wang, Xiangyu
    Li, Duo
    Lu, Lu
    DIAGNOSTICS, 2023, 13 (03)
  • [32] ST-Unet: Swin Transformer boosted U-Net with Cross-Layer Feature Enhancement for medical image segmentation
    Zhang, Jing
    Qin, Qiuge
    Ye, Qi
    Ruan, Tong
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 153
  • [33] Three-Dimensional Parallel Convolution Neural Network Brain Tumor Segmentation Based on Dilated Convolution
    Feng Bowen
    Lu Xiaoqi
    Gu Yu
    Li Qing
    Liu Yang
    LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (14)
  • [34] CSAP-UNet: Convolution and self-attention paralleling network for medical image segmentation with edge enhancement
    Fan X.
    Zhou J.
    Jiang X.
    Xin M.
    Hou L.
    Computers in Biology and Medicine, 2024, 172
  • [35] DMFC-UFormer: Depthwise multi-scale factorized convolution transformer-based UNet for medical image segmentation
    Garbaz, Anass
    Oukdach, Yassine
    Charfi, Said
    El Ansari, Mohamed
    Koutti, Lahcen
    Salihoun, Mouna
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 101
  • [36] TT-Net: Tensorized Transformer Network for 3D medical image segmentation
    Wang, Jing
    Qu, Aixi
    Wang, Qing
    Zhao, Qibin
    Liu, Ju
    Wu, Qiang
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2023, 107
  • [37] Feature interaction network based on hierarchical decoupled convolution for 3D medical image segmentation
    Shen, Longfeng
    Zhang, Yingjie
    Wang, Qiong
    Qin, Fenglan
    Sun, Dengdi
    Min, Hai
    Meng, Qianqian
    Xu, Chengzhen
    Zhao, Wei
    Song, Xin
    PLOS ONE, 2023, 18 (07):
  • [38] Three-Dimensional Point Cloud Semantic Segmentation Network Based on Spatial Graph Convolution Network
    Zhang Kun
    Zhu Yawei
    Wang Xiaohong
    Zhang Liting
    Zhong Ruofei
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (02)
  • [39] Novel multistage three-dimensional medical image segmentation: Methodology and validation
    Gu, Lixu
    Xu, Jianfeng
    Peters, Terence M.
    IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2006, 10 (04): : 740 - 748
  • [40] A Three-Dimensional Medical Image Segmentation App Using Graphic Theory
    He, Tiancheng
    Xue, Zhong
    Wong, Stephen T.
    2016 3RD IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, 2016, : 268 - 271