HCTA-Net: A Hybrid CNN-Transformer Attention Network for Surgical Instrument Segmentation

被引:2
|
作者
Yang, Lei [1 ]
Wang, Hongyong [1 ]
Bian, Guibin [1 ,2 ]
Liu, Yanhong [1 ]
机构
[1] Zhengzhou Univ, Sch Elect Engn, Zhengzhou 450001, Henan, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Image segmentation; Feature extraction; Instruments; Transformers; Task analysis; Surgery; Robots; Surgical instruments; Deep architecture; Medical robotics; surgical instrument segmentation; transformer; residual network; deep supervision; FEATURE AGGREGATION; IMAGES;
D O I
10.1109/TMRB.2023.3315479
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Surgical robots nowadays have an increasingly important role in surgery, and the accurate surgical instrument segmentation is one of important prerequisites for their stable operations. However, this task is against with some challenging factors, such as scaling transformation, specular reflection, etc. Recently, transformer has shown their superior segmentation performance in the field of image segmentation, which has a strong remote dependence detection capability. However, it could not well capture locality and translation invariance. In this paper, taking the advantages of transformer and CNN, a hybrid CNN-Transformer attention network, named HCTA-Net, is proposed for automatic surgical instrument segmentation. To be able to better extract more comprehensive feature information from surgical images, a dual-path encoding unit is proposed for effective feature representation of local detail feature and global contexts. Meanwhile, an attention-based feature enhancement (AFE) module is proposed for feature complementary of dual-path encoding networks. In addition, to mitigate the issue of limited processing capacity associated with simple connections, a multi-dimension attention (MDA) module is built to process the intermediate features from three directions, including width, height and space, to filter the interference features while emphasizing the key feature regions of local feature maps. Further, an additive attention enhancement (AAE) module is introduced for further feature enhancement of local feature maps. Finally, in order to obtain more multi-scale global information, a multi-scale context fusion (MCF) module is proposed at the bottleneck layer to obtain different receptive fields to enrich feature representation. Experimental results show that proposed HCTA-Net network can achieve superior segmentation performance on surgical instruments compared to other state-of-the-art (SOTA) segmentation models.
引用
收藏
页码:929 / 944
页数:16
相关论文
共 50 条
  • [21] CNN-TRANSFORMER WITH SELF-ATTENTION NETWORK FOR SOUND EVENT DETECTION
    Wakayama, Keigo
    Saito, Shoichiro
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 806 - 810
  • [22] STA-Former: enhancing medical image segmentation with Shrinkage Triplet Attention in a hybrid CNN-Transformer model
    Liu, Yuzhao
    Han, Liming
    Yao, Bin
    Li, Qing
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (02) : 1901 - 1910
  • [23] DBCT-Net:A dual branch hybrid CNN-transformer network for remote sensing image fusion
    Wang, Quanli
    Jin, Xin
    Jiang, Qian
    Wu, Liwen
    Zhang, Yunchun
    Zhou, Wei
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 233
  • [24] SWFormer: A scale-wise hybrid CNN-Transformer network for multi-classes weed segmentation
    Jiang, Hongkui
    Chen, Qiupu
    Wang, Rujing
    Du, Jianming
    Chen, Tianjiao
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (07)
  • [25] Multi-level wavelet network based on CNN-Transformer hybrid attention for single image deraining
    Liu, Bin
    Fang, Siyan
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (30): : 22387 - 22404
  • [26] STA-Former: enhancing medical image segmentation with Shrinkage Triplet Attention in a hybrid CNN-Transformer model
    Yuzhao Liu
    Liming Han
    Bin Yao
    Qing Li
    Signal, Image and Video Processing, 2024, 18 : 1901 - 1910
  • [27] Multi-level wavelet network based on CNN-Transformer hybrid attention for single image deraining
    Bin Liu
    Siyan Fang
    Neural Computing and Applications, 2023, 35 : 22387 - 22404
  • [28] CT-Net: an interpretable CNN-Transformer fusion network for fNIRS classification
    Liao, Lingxiang
    Lu, Jingqing
    Wang, Lutao
    Zhang, Yongqing
    Gao, Dongrui
    Wang, Manqing
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (10) : 3233 - 3247
  • [29] Hybrid CNN-transformer network for interactive learning of challenging musculoskeletal images
    Bi, Lei
    Buehner, Ulrich
    Fu, Xiaohang
    Williamson, Tom
    Choong, Peter
    Kim, Jinman
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 243
  • [30] Hybrid CNN-Transformer Network for Electricity Theft Detection in Smart Grids
    Bai, Yu
    Sun, Haitong
    Zhang, Lili
    Wu, Haoqi
    SENSORS, 2023, 23 (20)