DTMFormer: Dynamic Token Merging for Boosting Transformer-Based Medical Image Segmentation

被引:0
|
作者
Wang, Zhehao [1 ]
Lin, Xian [1 ]
Wu, Nannan [1 ]
Yu, Li [1 ]
Cheng, Kwang-Ting [2 ]
Yan, Zengqiang [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan, Peoples R China
[2] Hong Kong Univ Sci & Technol, Sch Engn, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
ATTENTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the great potential in capturing long-range dependency, one rarely-explored underlying issue of transformer in medical image segmentation is attention collapse, making it often degenerate into a bypass module in CNN-Transformer hybrid architectures. This is due to the high computational complexity of vision transformers requiring extensive training data while well-annotated medical image data is relatively limited, resulting in poor convergence. In this paper, we propose a plug-n-play transformer block with dynamic token merging, named DTMFormer, to avoid building long-range dependency on redundant and duplicated tokens and thus pursue better convergence. Specifically, DTMFormer consists of an attention-guided token merging (ATM) module to adaptively cluster tokens into fewer semantic tokens based on feature and dependency similarity and a light token reconstruction module to fuse ordinary and semantic tokens. In this way, as self-attention in ATM is calculated based on fewer tokens, DTMFormer is of lower complexity and more friendly to converge. Extensive experiments on publicly-available datasets demonstrate the effectiveness of DTMFormer working as a plug-n-play module for simultaneous complexity reduction and performance improvement. We believe it will inspire future work on rethinking transformers in medical image segmentation. Code: https://github.com/iam- nacl/DTMFormer.
引用
收藏
页码:5814 / 5822
页数:9
相关论文
共 50 条
  • [1] Transformer-Based Annotation Bias-Aware Medical Image Segmentation
    Liao, Zehui
    Hu, Shishuai
    Xie, Yutong
    Xia, Yong
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 24 - 34
  • [2] A Transformer-Based Network for Anisotropic 3D Medical Image Segmentation
    Guo, Danfeng
    Terzopoulos, Demetri
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8857 - 8861
  • [3] SMESwin Unet: Merging CNN and Transformer for Medical Image Segmentation
    Wang, Ziheng
    Min, Xiongkuo
    Shi, Fangyu
    Jin, Ruinian
    Nawrin, Saida S.
    Yu, Ichen
    Nagatomi, Ryoichi
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 : 517 - 526
  • [4] Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation
    Cam Nguyen
    Asad, Zuhayr
    Deng, Ruining
    Huo, Yuankai
    [J]. MEDICAL IMAGING 2022: IMAGE PROCESSING, 2022, 12032
  • [5] Recent progress in transformer-based medical image analysis
    Liu, Zhaoshan
    Lv, Qiujie
    Yang, Ziduo
    Li, Yifan
    Lee, Chau Hung
    Shen, Lei
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 164
  • [6] A Transformer-Based Network for Deformable Medical Image Registration
    Wang, Yibo
    Qian, Wen
    Li, Mengqi
    Zhang, Xuming
    [J]. ARTIFICIAL INTELLIGENCE, CICAI 2022, PT I, 2022, 13604 : 502 - 513
  • [7] ResTrans-Unet: A Residual-Aware Transformer-Based Approach to Medical Image Segmentation
    Ma, Fengying
    Wang, Zhi
    Ji, Peng
    Fu, Chengcai
    Wang, Feng
    [J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (04)
  • [8] Abstract: 3D Medical Image Segmentation with Transformer-based Scaling of ConvNets MedNeXt
    Roy, Saikat
    Koehler, Gregor
    Baumgartner, Michael
    Ulrich, Constantin
    Isensee, Fabian
    Jaeger, Paul F.
    Maier-Hein, Klaus
    [J]. BILDVERARBEITUNG FUR DIE MEDIZIN 2024, 2024, : 79 - 79
  • [9] TransWS: Transformer-Based Weakly Supervised Histology Image Segmentation
    Zhang, Shaoteng
    Zhang, Jianpeng
    Xia, Yong
    [J]. MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2022, 2022, 13583 : 367 - 376
  • [10] TMFormer: Token Merging Transformer for Brain Tumor Segmentation with Missing Modalities
    Zhang, Zheyu
    Yang, Gang
    Zhang, Yueyi
    Yue, Huanjing
    Liu, Aiping
    Ou, Yunwei
    Gong, Jian
    Sun, Xiaoyan
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7414 - 7422