SDPT: Semantic-Aware Dimension-Pooling Transformer for Image Segmentation

被引:0
|
作者
Cao, Hu [1 ]
Chen, Guang [2 ]
Zhao, Hengshuang [3 ]
Jiang, Dongsheng [4 ]
Zhang, Xiaopeng [4 ]
Tian, Qi [4 ]
Knoll, Alois [1 ]
机构
[1] Tech Univ Munich, Chair Robot Artificial Intelligence & Real Time Sy, D-80333 Munich, Germany
[2] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200070, Peoples R China
[3] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[4] Huawei Technol, Shanghai 200122, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Image segmentation; Decoding; Task analysis; Semantics; Image edge detection; Computational efficiency; vision transformer; dimension-pooling attention; semantic-balanced decoder; scene understanding; VISION; SHIFT;
D O I
10.1109/TITS.2024.3417813
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Image segmentation plays a critical role in autonomous driving by providing vehicles with a detailed and accurate understanding of their surroundings. Transformers have recently shown encouraging results in image segmentation. However, transformer-based models are challenging to strike a better balance between performance and efficiency. The computational complexity of the transformer-based models is quadratic with the number of inputs, which severely hinders their application in dense prediction tasks. In this paper, we present the semantic-aware dimension-pooling transformer (SDPT) to mitigate the conflict between accuracy and efficiency. The proposed model comprises an efficient transformer encoder for generating hierarchical features and a semantic-balanced decoder for predicting semantic masks. In the encoder, a dimension-pooling mechanism is used in the multi-head self-attention (MHSA) to reduce the computational cost, and a parallel depth-wise convolution is used to capture local semantics. Simultaneously, we further apply this dimension-pooling attention (DPA) to the decoder as a refinement module to integrate multi-level features. With such a simple yet powerful encoder-decoder framework, we empirically demonstrate that the proposed SDPT achieves excellent performance and efficiency on various popular benchmarks, including ADE20K, Cityscapes, and COCO-Stuff. For example, our SDPT achieves 48.6 $\%$ mIOU on the ADE20K dataset, which outperforms the current methods with fewer computational costs. The codes can be found at https://github.com/HuCaoFighting/SDPT.
引用
收藏
页码:15934 / 15946
页数:13
相关论文
共 50 条
  • [1] SaTransformer: Semantic-aware transformer for breast cancer classification and segmentation
    Zhang, Jie
    Zhang, Zhichao
    Liu, Hua
    Xu, Shiqiang
    [J]. IET IMAGE PROCESSING, 2023, 17 (13) : 3789 - 3800
  • [2] Semantic-aware Transformer for shadow detection
    Zhou, Kai
    Fang, Jing-Long
    Wu, Wen
    Shao, Yan-Li
    Wang, Xing-Qi
    Wei, Dan
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 240
  • [3] SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image Prediction
    Li, Zekun
    Liu, Yufan
    Li, Bing
    Feng, Bailan
    Wu, Kebin
    Peng, Chengwei
    Hu, Weiming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6160 - 6173
  • [4] Semantic-Aware Domain Generalized Segmentation
    Peng, Duo
    Lei, Yinjie
    Hayat, Munawar
    Guo, Yulan
    Li, Wen
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2584 - 2595
  • [5] Semantic-Aware Superpixel for Weakly Supervised Semantic Segmentation
    Kim, Sangtae
    Park, Daeyoung
    Shim, Byonghyo
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1142 - 1150
  • [6] Co-learning Semantic-Aware Unsupervised Segmentation for Pathological Image Registration
    Liu, Yang
    Gu, Shi
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT X, 2023, 14229 : 537 - 547
  • [7] Semantic-Aware Contrastive Learning for Multi-Object Medical Image Segmentation
    Lee, Ho Hin
    Tang, Yucheng
    Yang, Qi
    Yu, Xin
    Cai, Leon Y.
    Remedios, Lucas W.
    Bao, Shunxing
    Landman, Bennett A.
    Huo, Yuankai
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (09) : 4444 - 4453
  • [8] Semantic-Aware Dynamic Parameter for Video Inpainting Transformer
    Lee, Eunhye
    Yoo, Jinsu
    Yang, Yunjeong
    Baik, Sungyong
    Kim, Tae Hyun
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12903 - 12912
  • [9] Semantic-Aware Visual Decomposition for Image Coding
    Jianhui Chang
    Jian Zhang
    Jiguo Li
    Shiqi Wang
    Qi Mao
    Chuanmin Jia
    Siwei Ma
    Wen Gao
    [J]. International Journal of Computer Vision, 2023, 131 : 2333 - 2355
  • [10] Semantic-Aware Triplet Loss for Image Classification
    Wang, Guangzhi
    Guo, Yangyang
    Xu, Ziwei
    Wong, Yongkang
    Kankanhalli, Mohan S.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4563 - 4572