Comprehensive attention transformer for multi-label segmentation of medical images based on multi-scale feature fusion

被引:0
|
作者
Cheng, Hangyuan [1 ]
Guo, Xiaoxin [2 ]
Yang, Guangqi [1 ]
Chen, Cong [3 ]
Dong, Hongliang [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[3] Jilin Univ, Coll Software, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
Comprehensive attention; Mixed-attention transformer; Multi-scale aggregation; Transformer; UNET PLUS PLUS; NETWORKS;
D O I
10.1016/j.compeleceng.2025.110100
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer-based models often neglect the convolutional networks' capability to extract local features. Most U-shaped models only utilize the multi-scale features from the encoder's output and focus solely on the final layer of the decoder's output. Moreover, most typical skip connections can be configured only between the encoder and the decoder in the same layer without any potential optimization An innovative comprehensive attention Transformer (CAFormer) is proposed to address the issue of long-range relations and local features in multi- label segmentation of medical images, which adopts a U-shaped hierarchical encoder-decoder structure. A mixed-attention Transformer (MATrans) module is devised to extract multi-scale features from the encoders and establish multiple encoder-decoder connections using channel- wise cross-attention and self-attention, which can automatically configure the optimal skip connections. During the upsampling, a channel-based feature fusion module is proposed to focus on the important channel-based features. A comprehensive attention module (CAM) is designed to extract global context and local features by integrating an enhanced Transformer module, a channel and spatial attention module. Additionally, the encoder's multi-scale features undergo the hierarchical prediction link through the proposed multi-scale aggregation module (MSAM) for the final prediction rather than directly using the output of the last layer of the decoder as the segmentation outcome. The experiments show that the CAFormer is efficient, robust, achieves the DSC of 82.26 and the HD of 15.26 on the Synapse dataset, and outperforms other state-of-the-art models. The code and pre-trained models are available at https://github.com/ zed-kingc/CAFormer.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Multi-scale attention fusion network for semantic segmentation of remote sensing images
    Wen, Zhiqiang
    Huang, Hongxu
    Liu, Shuai
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (24) : 7909 - 7926
  • [22] Deep multi-scale feature fusion for pancreas segmentation from CT images
    Zhanlan Chen
    Xiuying Wang
    Ke Yan
    Jiangbin Zheng
    International Journal of Computer Assisted Radiology and Surgery, 2020, 15 : 415 - 423
  • [23] Deep multi-scale feature fusion for pancreas segmentation from CT images
    Chen, Zhanlan
    Wang, Xiuying
    Yan, Ke
    Zheng, Jiangbin
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2020, 15 (03) : 415 - 423
  • [24] Multi-scale feature fusion for pavement crack detection based on Transformer
    Yang, Yalong
    Niu, Zhen
    Su, Liangliang
    Xu, Wenjing
    Wang, Yuanhang
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14920 - 14937
  • [25] Multi-scale Feature Fusion Object Detection Based on Swin Transformer
    Zhang, Ying
    Wu, Lin
    Deng, Huaxuan
    Hu, Jun
    Li, Xifan
    39TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION, YAC 2024, 2024, : 1982 - 1987
  • [26] Dynamic Q&A multi-label classification based on adaptive multi-scale feature extraction
    Li, Ying
    Li, Ming
    Zhang, Xiaoyi
    Ding, Jin
    APPLIED SOFT COMPUTING, 2025, 170
  • [27] An Optimized PatchMatch for multi-scale and multi-feature label fusion
    Giraud, Remi
    Vinh-Thong Ta
    Papadakis, Nicolas
    Manjon, Jose V.
    Collins, D. Louis
    Coupe, Pierrick
    NEUROIMAGE, 2016, 124 : 770 - 782
  • [28] FNeXter: A Multi-Scale Feature Fusion Network Based on ConvNeXt and Transformer for Retinal OCT Fluid Segmentation
    Niu, Zhiyuan
    Deng, Zhuo
    Gao, Weihao
    Bai, Shurui
    Gong, Zheng
    Chen, Chucheng
    Rong, Fuju
    Li, Fang
    Ma, Lan
    SENSORS, 2024, 24 (08)
  • [29] Multi-scale feature fusion attention of stereo vision depth recovery network based on Swin Transformer
    Zou, Changjun
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2025, 19 (01): : 149 - 166
  • [30] GCFormer: Multi-scale feature plays a crucial role in medical images segmentation
    Feng, Yuncong
    Cong, Yeming
    Xing, Shuaijie
    Wang, Hairui
    Ren, Zihang
    Zhang, Xiaoli
    KNOWLEDGE-BASED SYSTEMS, 2024, 300