SDPT: Semantic-Aware Dimension-Pooling Transformer for Image Segmentation

被引:0
|
作者
Cao, Hu [1 ]
Chen, Guang [2 ]
Zhao, Hengshuang [3 ]
Jiang, Dongsheng [4 ]
Zhang, Xiaopeng [4 ]
Tian, Qi [4 ]
Knoll, Alois [1 ]
机构
[1] Tech Univ Munich, Chair Robot Artificial Intelligence & Real Time Sy, D-80333 Munich, Germany
[2] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200070, Peoples R China
[3] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[4] Huawei Technol, Shanghai 200122, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Image segmentation; Decoding; Task analysis; Semantics; Image edge detection; Computational efficiency; vision transformer; dimension-pooling attention; semantic-balanced decoder; scene understanding; VISION; SHIFT;
D O I
10.1109/TITS.2024.3417813
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Image segmentation plays a critical role in autonomous driving by providing vehicles with a detailed and accurate understanding of their surroundings. Transformers have recently shown encouraging results in image segmentation. However, transformer-based models are challenging to strike a better balance between performance and efficiency. The computational complexity of the transformer-based models is quadratic with the number of inputs, which severely hinders their application in dense prediction tasks. In this paper, we present the semantic-aware dimension-pooling transformer (SDPT) to mitigate the conflict between accuracy and efficiency. The proposed model comprises an efficient transformer encoder for generating hierarchical features and a semantic-balanced decoder for predicting semantic masks. In the encoder, a dimension-pooling mechanism is used in the multi-head self-attention (MHSA) to reduce the computational cost, and a parallel depth-wise convolution is used to capture local semantics. Simultaneously, we further apply this dimension-pooling attention (DPA) to the decoder as a refinement module to integrate multi-level features. With such a simple yet powerful encoder-decoder framework, we empirically demonstrate that the proposed SDPT achieves excellent performance and efficiency on various popular benchmarks, including ADE20K, Cityscapes, and COCO-Stuff. For example, our SDPT achieves 48.6 $\%$ mIOU on the ADE20K dataset, which outperforms the current methods with fewer computational costs. The codes can be found at https://github.com/HuCaoFighting/SDPT.
引用
下载
收藏
页码:15934 / 15946
页数:13
相关论文
共 50 条
  • [41] Combining max-pooling and wavelet pooling strategies for semantic image segmentation
    Brito, Andre de Souza
    Vieira, Marcelo Bernardes
    Sguario Coelho de Andrade, Mauren Louise
    Feitosa, Raul Queiroz
    Giraldi, Gilson Antonio
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
  • [42] HSNet: An Intelligent Hierarchical Semantic-Aware Network System for Real-Time Semantic Segmentation
    Peng, Xin
    Cheng, Jieren
    Tang, Xiangyan
    Deng, Ziqi
    Tu, Wenxuan
    Xiong, Neal
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (07): : 4318 - 4330
  • [43] SCTS: Instance Segmentation of Single Cells Using a Transformer-Based Semantic-Aware Model and Space-Filling Augmentation
    Zhou, Yating
    Li, Wenjing
    Yang, Ge
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5933 - 5942
  • [44] A novel gradient and semantic-aware transformer network for low-light image enhancementA novel gradient and semantic-aware...T. Zhan et al.
    Tianming Zhan
    Chenyang Lu
    Huapeng Wu
    Chenyun Wang
    Multimedia Systems, 2025, 31 (3)
  • [45] Cross-Domain Detection Transformer Based on Spatial-Aware and Semantic-Aware Token Alignment
    Deng, Jinhong
    Zhang, Xiaoyue
    Li, Wen
    Duan, Lixin
    Xu, Dong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5234 - 5245
  • [46] Semantic-aware normalizing flow with feature fusion for image anomaly detection
    Ma, Wei
    Li, Yao
    Lan, Shiyong
    Wang, Wenwu
    Huang, Weikang
    Zhu, Wujiang
    NEUROCOMPUTING, 2024, 590
  • [47] Semantic-Aware Feature Aggregation for Few-Shot Image Classification
    Fusheng Hao
    Fuxiang Wu
    Fengxiang He
    Qieshi Zhang
    Chengqun Song
    Jun Cheng
    Neural Processing Letters, 2023, 55 : 6595 - 6609
  • [48] Semantic-Aware Feature Aggregation for Few-Shot Image Classification
    Hao, Fusheng
    Wu, Fuxiang
    He, Fengxiang
    Zhang, Qieshi
    Song, Chengqun
    Cheng, Jun
    NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6595 - 6609
  • [49] ObjectBook Construction for Large-Scale Semantic-Aware Image Retrieval
    Zhang, Shiliang
    Tian, Qi
    Huang, Qingming
    Gao, Wen
    2011 IEEE 13TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2011,
  • [50] Semantic-Aware Knowledge-Guided Framework for Underwater Image Enhancement
    Hu, Yi
    Jing, Niqin
    Zhan, Xiaodong
    PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024, 2024, : 144 - 147