MECPformer: multi-estimations complementary patch with CNN-transformers for weakly supervised semantic segmentation

被引:3
|
作者
Liu, Chunmeng [1 ]
Li, Guangyao [1 ]
Shen, Yao [1 ]
Wang, Ruiqi [1 ]
机构
[1] Tongji Univ, Coll Elect & Informat Engn, Shanghai 201804, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 31期
关键词
Weakly supervised learning; Semantic segmentation; Transformer; CNN; Computer vision;
D O I
10.1007/s00521-023-08816-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The initial seed based on the convolutional neural network (CNN) for weakly supervised semantic segmentation always highlights the most discriminative regions but fails to identify the global target information. Methods based on transformers have been proposed successively benefiting from the advantage of capturing long-range feature representations. However, we observe a flaw regardless of the gifts based on the transformer. Given a class, the initial seeds generated based on the transformer may invade regions belonging to other classes. Inspired by the mentioned issues, we devise a simple yet effective method with multi-estimations complementary patch (MECP) strategy and adaptive conflict module (ACM), dubbed MECPformer. Given an image, we manipulate it with the MECP strategy at different epochs, and the network mines and deeply fuses the semantic information at different levels. In addition, ACM adaptively removes conflicting pixels and exploits the network self-training capability to mine potential target information. Without bells and whistles, our MECPformer has reached new state-of-the-art 72.0% mIoU on the PASCAL VOC 2012 and 42.4% on MS COCO 2014 dataset. The code is available at https://github.com/ChunmengLiu1/MECPformer.
引用
收藏
页码:23249 / 23264
页数:16
相关论文
共 50 条
  • [31] MCTformer plus : Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation
    Xu, Lian
    Bennamoun, Mohammed
    Boussaid, Farid
    Laga, Hamid
    Ouyang, Wanli
    Xu, Dan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8380 - 8395
  • [32] Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation
    Bevandic, Petra
    Orsic, Marin
    Saric, Josip
    Grubisic, Ivan
    Segvic, Sinisa
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (07) : 2450 - 2472
  • [33] Multi-layered self-attention mechanism for weakly supervised semantic segmentation
    Yaganapu, Avinash
    Kang, Mingon
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 239
  • [34] A Weakly Supervised Multi-task Ranking Framework for Actor–Action Semantic Segmentation
    Yan Yan
    Chenliang Xu
    Dawen Cai
    Jason J. Corso
    International Journal of Computer Vision, 2020, 128 : 1414 - 1432
  • [35] Weakly Supervised Semantic Segmentation using Constrained Multi-Image Model and Saliency Prior
    Yu, Mingjun
    Han, Zheng
    Wang, Pingquan
    Jia, Xiaoyan
    TENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2018), 2018, 10806
  • [36] Multi-Modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation
    Li, Xiawei
    Xu, Qingyuan
    Zhang, Jing
    Zhang, Tianyi
    Yu, Qian
    Sheng, Lu
    Xu, Dong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3216 - 3224
  • [37] Enhancing weakly supervised semantic segmentation through multi-class token attention learning
    Luo, Huilan
    Zeng, Zhen
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [38] Weakly supervised semantic segmentation and optimization algorithm based on multi-scale feature model
    Xiong C.
    Zhi H.
    Tongxin Xuebao/Journal on Communications, 2019, 40 (01): : 163 - 171
  • [39] Deconfounded multi-organ weakly-supervised semantic segmentation via causal intervention
    Chen, Kaitao
    Sun, Shiliang
    Du, Youtian
    INFORMATION FUSION, 2024, 108
  • [40] A Weakly Supervised Multi-task Ranking Framework for Actor-Action Semantic Segmentation
    Yan, Yan
    Xu, Chenliang
    Cai, Dawen
    Corso, Jason J.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (05) : 1414 - 1432