Segmenter: Transformer for Semantic Segmentation

被引:824
|
作者
Strudel, Robin [1 ]
Garcia, Ricardo [1 ]
Laptev, Ivan [1 ]
Schmid, Cordelia [1 ]
机构
[1] PSL Res Univ, Inria, Ecole Normale Super, CNRS, F-75005 Paris, France
关键词
D O I
10.1109/ICCV48922.2021.00717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image segmentation is often ambiguous at the level of individual image patches and requires contextual information to reach label consensus. In this paper we introduce Segmenter, a transformer model for semantic segmentation. In contrast to convolution-based methods, our approach allows to model global context already at the first layer and throughout the network. We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation. To do so, we rely on the output embeddings corresponding to image patches and obtain class labels from these embeddings with a point-wise linear decoder or a mask transformer decoder. We leverage models pre-trained for image classification and show that we can fine-tune them on moderate sized datasets available for semantic segmentation. The linear decoder allows to obtain excellent results already, but the performance can be further improved by a mask transformer generating class masks. We conduct an extensive ablation study to show the impact of the different parameters, in particular the performance is better for large models and small patch sizes. Segmenter attains excellent results for semantic segmentation. It outperforms the state of the art on both ADE20K and Pascal Context datasets and is competitive on Cityscapes.
引用
收藏
页码:7242 / 7252
页数:11
相关论文
共 50 条
  • [1] Graph-Segmenter: graph transformer with boundary-aware attention for semantic segmentation
    Wu, Zizhang
    Gan, Yuanzhu
    Xu, Tianhao
    Wang, Fan
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (05)
  • [2] Graph-Segmenter: graph transformer with boundary-aware attention for semantic segmentation
    Zizhang Wu
    Yuanzhu Gan
    Tianhao Xu
    Fan Wang
    [J]. Frontiers of Computer Science, 2024, 18
  • [3] TrSeg: Transformer for semantic segmentation
    Jin, Youngsaeng
    Han, David
    Ko, Hanseok
    [J]. PATTERN RECOGNITION LETTERS, 2021, 148 : 29 - 35
  • [4] Transformer Scale Gate for Semantic Segmentation
    Shi, Hengcan
    Hayat, Munawar
    Cai, Jianfei
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3051 - 3060
  • [5] TransRVNet: LiDAR Semantic Segmentation With Transformer
    Cheng, Hui-Xian
    Han, Xian-Feng
    Xiao, Guo-Qiang
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (06) : 5895 - 5907
  • [6] Pyramid Fusion Transformer for Semantic Segmentation
    Qin, Zipeng
    Liu, Jianbo
    Zhang, Xiaolin
    Tian, Maoqing
    Zhou, Aojun
    Yi, Shuai
    Li, Hongsheng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9630 - 9643
  • [7] SSformer: A Lightweight Transformer for Semantic Segmentation
    Shi, Wentao
    Xu, Jing
    Gao, Pan
    [J]. 2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
  • [8] Scene sketch semantic segmentation with hierarchical Transformer
    Yang, Jie
    Ke, Aihua
    Yu, Yaoxiang
    Cai, Bo
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 280
  • [9] Graph Structure Guided Transformer for Semantic Segmentation
    Qian, Luyang
    Zhang, Canlong
    Li, Zhixin
    Wang, Zhiwen
    [J]. 2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 915 - 922
  • [10] MMSFormer: Multimodal Transformer for Material and Semantic Segmentation
    Reza, Md Kaykobad
    Prater-Bennette, Ashley
    Asif, M. Salman
    [J]. IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 599 - 610