Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution

被引：25

作者：

Cai, Yimin ^{[1
]}

Long, Yuqing ^{[2
]}

Han, Zhenggong ^{[3
]}

Liu, Mingkun ^{[1
]}

Zheng, Yuchen ^{[1
]}

Yang, Wei ^{[1
]}

Chen, Liming ^{[4
]}

机构：

[1] Guizhou Univ, Sch Med, Guiyang, Peoples R China

[2] ZunYi Med Univ, Sch Stomatolog, Zunyi, Peoples R China

[3] Guizhou Univ, Key Lab Adv Mfg Technol, Minist Educ, Guiyang, Peoples R China

[4] Guizhou Univ, Dent Hosp Guizhou Univ, Guiyang Dent Hosp, Guiyang, Peoples R China

来源：

BMC MEDICAL INFORMATICS AND DECISION MAKING | 2023年 / 23卷 / 01期

关键词：

Deep learning; Medical image segmentation; 3D Swin Transformer; Brain tumor;

D O I：

10.1186/s12911-023-02129-z

中图分类号：

R-058 [];

学科分类号：

摘要：

Background Semantic segmentation of brain tumors plays a critical role in clinical treatment, especially for threedimensional (3D) magnetic resonance imaging, which is often used in clinical practice. Automatic segmentation of the 3D structure of brain tumors can quickly help physicians understand the properties of tumors, such as the shape and size, thus improving the efficiency of preoperative planning and the odds of successful surgery. In past decades, 3D convolutional neural networks (CNNs) have dominated automatic segmentation methods for 3D medical images, and these network structures have achieved good results. However, to reduce the number of neural network parameters, practitioners ensure that the size of convolutional kernels in 3D convolutional operations generally does not exceed 7 x 7 x 7 , which also leads to CNNs showing limitations in learning long-distance dependent information. Vision Transformer (ViT) is very good at learning long-distance dependent information in images, but it suffers from the problems of many parameters. What's worse, the ViT cannot learn local dependency information in the previous layers under the condition of insufficient data. However, in the image segmentation task, being able to learn this local dependency information in the previous layers makes a big impact on the performance of the model. Methods This paper proposes the Swin Unet3D model, which represents voxel segmentation on medical images as a sequence-to-sequence prediction. The feature extraction sub-module in the model is designed as a parallel structure of Convolution and ViT so that all layers of the model are able to adequately learn both global and local dependency information in the image.Results On the validation dataset of Brats2021, our proposed model achieves dice coefficients of 0.840, 0.874, and 0.911 on the ET channel, TC channel, and WT channel, respectively. On the validation dataset of Brats2018, our model achieves dice coefficients of 0.716, 0.761, and 0.874 on the corresponding channels, respectively.Conclusion We propose a new segmentation model that combines the advantages of Vision Transformer and Convolution and achieves a better balance between the number of model parameters and segmentation accuracy. The code can be found at https://github.com/1152545264/SwinUnet3D.

引用

页数：13

共 50 条

[1] Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution
Yimin Cai
Yuqing Long
Zhenggong Han
Mingkun Liu
Yuchen Zheng
Wei Yang
Liming Chen
BMC Medical Informatics and Decision Making, 23
[2] SwinE-UNet3+: swin transformer encoder network for medical image segmentation
Ping Zou
Jian-Sheng Wu
Progress in Artificial Intelligence, 2023, 12 : 99 - 105
[3] SwinE-UNet3+: swin transformer encoder network for medical image segmentation
Zou, Ping
Wu, Jian-Sheng
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2023, 12 (01) : 99 - 105
[4] Combining Swin Transformer With UNet for Remote Sensing Image Semantic Segmentation
Fan, Lili
Zhou, Yu
Liu, Hongmei
Li, Yunjie
Cao, Dongpu
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 11
[5] ConvWin-UNet: UNet-like hierarchical vision Transformer combined with convolution for medical image segmentation
Feng, Xiaomeng
Wang, Taiping
Yang, Xiaohang
Zhang, Minfei
Guo, Wanpeng
Wang, Weina
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (01) : 128 - 144
[6] DSTUNET: UNET WITH EFFICIENT DENSE SWIN TRANSFORMER PATHWAY FOR MEDICAL IMAGE SEGMENTATION
Cai, Zhuotong
Xin, Jingmin
Shi, Peiwen
Wu, Jiayi
Zheng, Nanning
2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
[7] Medical image segmentation by combining feature enhancement Swin Transformer and UperNet
Lin Zhang
Xiaochun Yin
Xuqi Liu
Zengguang Liu
Scientific Reports, 15 (1)
[8] A novel full-convolution UNet-transformer for medical image segmentation
Zhu, Tianyou
Ding, Derui
Wang, Feng
Liang, Wei
Wang, Bo
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 89
[9] Swin Transformer Assisted Prior Attention Network for Medical Image Segmentation
Liao, Zhihao
Fan, Neng
Xu, Kai
APPLIED SCIENCES-BASEL, 2022, 12 (09):
[10] MDvT: introducing mobile three-dimensional convolution to a vision transformer for hyperspectral image classification
Zhou, Xinyao
Zhou, Wenzuo
Fu, Xiaoli
Hu, Yichen
Liu, Jinlian
INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2023, 16 (01) : 1469 - 1490

← 1 2 3 4 5 →