Token Contrast for Weakly-Supervised Semantic Segmentation

被引:29
|
作者
Ru, Lixiang [1 ,2 ,3 ]
Zheng, Hehang [3 ]
Zhan, Yibing [3 ]
Du, Bo [1 ,2 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Inst Artificial Intelligence, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China
[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan, Peoples R China
[3] JD Explore Acad, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.00302
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the local structure perception of CNN, CAM usually cannot identify the integral object regions. Though the recent Vision Transformer (ViT) can remedy this flaw, we observe it also brings the over-smoothing issue, i.e., the final patch tokens incline to be uniform. In this work, we propose Token Contrast (ToCo) to address this issue and further explore the virtue of ViT for WSSS. Firstly, motivated by the observation that intermediate layers in ViT can still retain semantic diversity, we designed a Patch Token Contrast module (PTC). PTC supervises the final patch tokens with the pseudo token relations derived from intermediate layers, allowing them to align the semantic regions and thus yield more accurate CAM. Secondly, to further differentiate the low-confidence regions in CAM, we devised a Class Token Contrast module (CTC) inspired by the fact that class tokens in ViT can capture high-level semantics. CTC facilitates the representation consistency between uncertain local regions and global objects by contrasting their class tokens. Experiments on the PASCAL VOC and MS COCO datasets show the proposed ToCo can remarkably surpass other single-stage competitors and achieve comparable performance with state-of-the-art multi-stage methods. Code is available at https://github.com/rulixiang/ToCo.
引用
收藏
页码:3093 / 3102
页数:10
相关论文
共 50 条
  • [31] Boosted MIML method for weakly-supervised image semantic segmentation
    Yang Liu
    Zechao Li
    Jing Liu
    Hanqing Lu
    [J]. Multimedia Tools and Applications, 2015, 74 : 543 - 559
  • [32] Deep graph cut network for weakly-supervised semantic segmentation
    Jiapei FENG
    Xinggang WANG
    Wenyu LIU
    [J]. Science China(Information Sciences), 2021, 64 (03) : 57 - 68
  • [33] Deep graph cut network for weakly-supervised semantic segmentation
    Jiapei Feng
    Xinggang Wang
    Wenyu Liu
    [J]. Science China Information Sciences, 2021, 64
  • [34] Efficient Object Region Discovery for Weakly-supervised Semantic Segmentation
    Zhong, Min
    Zeng, Gang
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2166 - 2171
  • [35] Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation
    Zhou, Tianfei
    Zhang, Meijie
    Zhao, Fang
    Li, Jianwu
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4289 - 4299
  • [36] Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
    Xu, Lian
    Ouyang, Wanli
    Bennamoun, Mohammed
    Boussaid, Farid
    Xu, Dan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4300 - 4309
  • [37] Coupling Global Context and Local Contents for Weakly-Supervised Semantic Segmentation
    Wang, Chunyan
    Zhang, Dong
    Zhang, Liyan
    Tang, Jinhui
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 13
  • [38] Weakly-Supervised Domain Adaptive Semantic Segmentation with Prototypical Contrastive Learning
    Das, Anurag
    Xian, Yongqin
    Dai, Dengxin
    Schiele, Bernt
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15434 - 15443
  • [39] Weakly-supervised semantic segmentation with superpixel guided local and global consistency
    Yi, Sheng
    Ma, Huimin
    Wang, Xiang
    Hu, Tianyu
    Li, Xi
    Wang, Yu
    [J]. PATTERN RECOGNITION, 2022, 124
  • [40] GraphNet: Learning Image Pseudo Annotations for Weakly-Supervised Semantic Segmentation
    Pu, Mengyang
    Huang, Yaping
    Guan, Qingji
    Zou, Qi
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 483 - 491