UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

被引：308

作者：

Gao, Yunhe ^{[1
]}

Zhou, Mu ^{[1
,2
]}

Metaxas, Dimitris N. ^{[1
]}

机构：

[1] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA

[2] SenseBrain & Shanghai AI Lab & Ctr Perceptual & I, Shanghai, Peoples R China

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT III | 2021年 / 12903卷

关键词：

D O I：

10.1007/978-3-030-87199-4_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer architecture has emerged to be successful in a number of natural language processing tasks. However, its applications to medical vision remain largely unexplored. In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation. UTNet applies self-attention modules in both encoder and decoder for capturing long-range dependency at different scales with minimal overhead. To this end, we propose an efficient self-attention mechanism along with relative position encoding that reduces the complexity of self-attention operation significantly from O(n(2)) to approximate O(n) . A new self-attention decoder is also proposed to recover fine-grained details from the skipped connections in the encoder. Our approach addresses the dilemma that Transformer requires huge amounts of data to learn vision inductive bias. Our hybrid layer design allows the initialization of Transformer into convolutional networks without a need of pre-training. We have evaluated UTNet on the multi-label, multi-vendor cardiac magnetic resonance imaging cohort. UTNet demonstrates superior segmentation performance and robustness against the state-of-the-art approaches, holding the promise to generalize well on other medical image segmentations.

引用

页码：61 / 71

页数：11

共 50 条

[41] An Improved Hybrid Model for Medical Image Segmentation
Yang Feng
Sun Xiaohuan
Chen Guoyue
Wen Tiexiang
2008 11TH IEEE SINGAPORE INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS (ICCS), VOLS 1-3, 2008, : 367 - +
[42] HCA-former: Hybrid Convolution Attention Transformer for 3D Medical Image Segmentation
Yang, Fan
Wang, Fan
Dong, Pengwei
Wang, Bo
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 90
[43] Multiscale transunet plus plus : dense hybrid U-Net with transformer for medical image segmentation
Wang, Bo
Wang, Fan
Dong, Pengwei
Li, Chongyi
SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (06) : 1607 - 1614
[44] TSE DeepLab: An efficient visual transformer for medical image segmentation
Yang, Jingdong
Tu, Jun
Zhang, Xiaolin
Yu, Shaoqing
Zheng, Xianyou
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 80
[45] TransCUNet: UNet cross fused transformer for medical image segmentation
Jiang, Shen
Li, Jinjiang
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 150
[46] SMESwin Unet: Merging CNN and Transformer for Medical Image Segmentation
Wang, Ziheng
Min, Xiongkuo
Shi, Fangyu
Jin, Ruinian
Nawrin, Saida S.
Yu, Ichen
Nagatomi, Ryoichi
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 : 517 - 526
[47] From CNN to Transformer: A Review of Medical Image Segmentation Models
Yao, Wenjian
Bai, Jiajun
Liao, Wei
Chen, Yuheng
Liu, Mengjuan
Xie, Yao
JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024, 37 (04): : 1529 - 1547
[48] MIXED TRANSFORMER U-NET FOR MEDICAL IMAGE SEGMENTATION
Wang, Hongyi
Xie, Shiao
Lin, Lanfen
Iwamoto, Yutaro
Han, Xian-Hua
Chen, Yen-Wei
Tong, Ruofeng
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2390 - 2394
[49] An effective CNN and Transformer complementary network for medical image segmentation
Yuan, Feiniu
Zhang, Zhengxiao
Fang, Zhijun
PATTERN RECOGNITION, 2023, 136
[50] MR-Trans: MultiResolution Transformer for medical image segmentation
Zou, Yibo
Ge, Yan
Zhao, Linlin
Li, Wei
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 165

← 1 2 3 4 5 →