HCTA-Net: A Hybrid CNN-Transformer Attention Network for Surgical Instrument Segmentation

被引:2
|
作者
Yang, Lei [1 ]
Wang, Hongyong [1 ]
Bian, Guibin [1 ,2 ]
Liu, Yanhong [1 ]
机构
[1] Zhengzhou Univ, Sch Elect Engn, Zhengzhou 450001, Henan, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Image segmentation; Feature extraction; Instruments; Transformers; Task analysis; Surgery; Robots; Surgical instruments; Deep architecture; Medical robotics; surgical instrument segmentation; transformer; residual network; deep supervision; FEATURE AGGREGATION; IMAGES;
D O I
10.1109/TMRB.2023.3315479
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Surgical robots nowadays have an increasingly important role in surgery, and the accurate surgical instrument segmentation is one of important prerequisites for their stable operations. However, this task is against with some challenging factors, such as scaling transformation, specular reflection, etc. Recently, transformer has shown their superior segmentation performance in the field of image segmentation, which has a strong remote dependence detection capability. However, it could not well capture locality and translation invariance. In this paper, taking the advantages of transformer and CNN, a hybrid CNN-Transformer attention network, named HCTA-Net, is proposed for automatic surgical instrument segmentation. To be able to better extract more comprehensive feature information from surgical images, a dual-path encoding unit is proposed for effective feature representation of local detail feature and global contexts. Meanwhile, an attention-based feature enhancement (AFE) module is proposed for feature complementary of dual-path encoding networks. In addition, to mitigate the issue of limited processing capacity associated with simple connections, a multi-dimension attention (MDA) module is built to process the intermediate features from three directions, including width, height and space, to filter the interference features while emphasizing the key feature regions of local feature maps. Further, an additive attention enhancement (AAE) module is introduced for further feature enhancement of local feature maps. Finally, in order to obtain more multi-scale global information, a multi-scale context fusion (MCF) module is proposed at the bottleneck layer to obtain different receptive fields to enrich feature representation. Experimental results show that proposed HCTA-Net network can achieve superior segmentation performance on surgical instruments compared to other state-of-the-art (SOTA) segmentation models.
引用
收藏
页码:929 / 944
页数:16
相关论文
共 50 条
  • [41] Semantic segmentation of terrace image regions based on lightweight CNN-Transformer hybrid networks
    Liu X.
    Yi S.
    Li L.
    Cheng X.
    Wang C.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2023, 39 (13): : 171 - 181
  • [42] CTMANet: A CNN-Transformer Hybrid Semantic Segmentation Network for Fine-Grained Airport Extraction in Complex SAR Scenes
    Wu, Keyu
    Cai, Feng
    Wang, Haipeng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 4689 - 4704
  • [43] CTHPose: An Efficient and Effective CNN-Transformer Hybrid Network for Human Pose Estimation
    Chen, Danya
    Wu, Lijun
    Chen, Zhicong
    Lin, Xufeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 327 - 339
  • [44] FFSwinNet: CNN-Transformer Combined Network With FFT for Shale Core SEM Image Segmentation
    Feng, Yilong
    Jia, Lijuan
    Zhang, Jinchuan
    Chen, Junqi
    IEEE ACCESS, 2024, 12 : 73021 - 73032
  • [45] BiTr-Unet: A CNN-Transformer Combined Network for MRI Brain Tumor Segmentation
    Jia, Qiran
    Shu, Hai
    BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES, BRAINLES 2021, PT II, 2022, 12963 : 3 - 14
  • [46] Multi-scale Gaussian Difference Preprocessing and Dual Stream CNN-Transformer Hybrid Network for Skin Lesion Segmentation
    Zhao, Xin
    Ren, Zhihang
    MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 671 - 682
  • [47] HCformer: Hybrid CNN-Transformer for LDCT Image Denoising
    Yuan, Jinli
    Zhou, Feng
    Guo, Zhitao
    Li, Xiaozeng
    Yu, Hengyong
    JOURNAL OF DIGITAL IMAGING, 2023, 36 (05) : 2290 - 2305
  • [48] TransSea: Hybrid CNN-Transformer With Semantic Awareness for 3-D Brain Tumor Segmentation
    Liu, Yu
    Ma, Yize
    Zhu, Zhiqin
    Cheng, Juan
    Chen, Xun
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [49] Cross Attention Multi Scale CNN-Transformer Hybrid Encoder Is General Medical Image Learner
    Zhou, Rongzhou
    Yao, Junfeng
    Hong, Qingqi
    Li, Xingxin
    Cao, Xianpeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XIII, 2024, 14437 : 85 - 97
  • [50] CNN-Transformer Hybrid Architecture for Early Fire Detection
    Yang, Chenyue
    Pan, Yixuan
    Cao, Yichao
    Lu, Xiaobo
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 570 - 581