CoT: Contourlet Transformer for Hierarchical Semantic Segmentation

被引:0
|
作者
Shao, Yilin [1 ]
Sun, Long [1 ]
Jiao, Licheng [1 ]
Liu, Xu [1 ]
Liu, Fang [1 ]
Li, Lingling [1 ]
Yang, Shuyuan [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Int Res Ctr Intelligent Percept & Computat, Minist Educ China,Key Lab Intelligent Percept & I, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Semantics; Semantic segmentation; Task analysis; Computed tomography; Convolutional neural networks; Contourlet transform (CT); semantic segmentation; sparse convolution; Transformer-convolutional neural network (CNN) hybrid model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer-CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory- and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context [57.21% mean intersection over union (mIoU)], ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.
引用
收藏
页码:132 / 146
页数:15
相关论文
共 50 条
  • [1] CoT: Contourlet Transformer for Hierarchical Semantic Segmentation
    Shao, Yilin
    Sun, Long
    Jiao, Licheng
    Liu, Xu
    Liu, Fang
    Li, Lingling
    Yang, Shuyuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 132 - 146
  • [2] Scene sketch semantic segmentation with hierarchical Transformer
    Yang, Jie
    Ke, Aihua
    Yu, Yaoxiang
    Cai, Bo
    KNOWLEDGE-BASED SYSTEMS, 2023, 280
  • [3] HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation
    Chen, Siyu
    Han, Ting
    Zhang, Changshe
    Su, Jinhe
    Wang, Ruisheng
    Chen, Yiping
    Wang, Zongyue
    Cai, Guorong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025,
  • [4] HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation
    Ding, Jian
    Xue, Nan
    Xia, Gui-Song
    Schiele, Bernt
    Dai, Dengxin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15413 - 15423
  • [5] Transformer Enhanced Hierarchical 3D Point Cloud Semantic Segmentation
    Liu, Yaohua
    Ma, Yue
    Xu, Min
    2ND INTERNATIONAL CONFERENCE ON APPLIED MATHEMATICS, MODELLING, AND INTELLIGENT COMPUTING (CAMMIC 2022), 2022, 12259
  • [6] TrSeg: Transformer for semantic segmentation
    Jin, Youngsaeng
    Han, David
    Ko, Hanseok
    PATTERN RECOGNITION LETTERS, 2021, 148 : 29 - 35
  • [7] Segmenter: Transformer for Semantic Segmentation
    Strudel, Robin
    Garcia, Ricardo
    Laptev, Ivan
    Schmid, Cordelia
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7242 - 7252
  • [8] ELiFormer: A hierarchical Transformer based Model with Efficient Encoder and Lightweight Decoder for Semantic Segmentation
    Wu, Zixuan
    Zhou, Yue
    2024 2ND ASIA CONFERENCE ON COMPUTER VISION, IMAGE PROCESSING AND PATTERN RECOGNITION, CVIPPR 2024, 2024,
  • [9] A Hierarchical Loss for Semantic Segmentation
    Muller, Bruce
    Smith, William
    VISAPP: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 4: VISAPP, 2020, : 260 - 267
  • [10] Deep Hierarchical Semantic Segmentation
    Li, Liulei
    Zhou, Tianfei
    Wang, Wenguan
    Li, Jianwu
    Yang, Yi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1236 - 1247