DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

被引:10
|
作者
Wu, Weijia [1 ,3 ]
Zhao, Yuzhong [2 ]
Shou, Mike Zheng [3 ]
Zhou, Hong [1 ]
Shen, Chunhua [1 ,4 ]
机构
[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Natl Univ Singapore, Singapore, Singapore
[4] Ant Grp, Hangzhou, Zhejiang, Peoples R China
基金
国家重点研发计划; 新加坡国家研究基金会;
关键词
D O I
10.1109/ICCV51070.2023.00117
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Collecting and annotating images with pixel-wise labels is time-consuming and laborious. In contrast, synthetic data can be freely available using a generative model (e.g., DALL-E, Stable Diffusion). In this paper, we show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model, which uses only text-image pairs during training. Our approach, termed DiffuMask, exploits the potential of the cross-attention map between text and image, which is natural and seamless to extend the text-driven image synthesis to semantic mask generation. DiffuMask uses text-guided cross-attention information to localize class/word-specific regions, which are combined with practical techniques to create a novel high-resolution and class-discriminative pixel-wise mask. The methods help to significantly reduce data collection and annotation costs. Experiments demonstrate that the existing segmentation methods trained on synthetic data of DiffuMask can achieve a competitive performance over the counterpart of real data (VOC 2012, Cityscapes). For some classes (e.g., bird), DiffuMask presents promising performance, close to the state-of-the-art result of real data (within 3% mIoU gap). Moreover, in the open-vocabulary segmentation (zero-shot) setting, DiffuMask achieves new state-of-the-art results on the Unseen classes of VOC 2012. The project website can be found at DiffuMask.
引用
收藏
页码:1206 / 1217
页数:12
相关论文
共 50 条
  • [1] No pixel-level annotations needed
    Jeroen van der Laak
    Francesco Ciompi
    Geert Litjens
    [J]. Nature Biomedical Engineering, 2019, 3 : 855 - 856
  • [2] No pixel-level annotations needed
    van der Laak, Jeroen
    Ciompi, Francesco
    Litjens, Geert
    [J]. NATURE BIOMEDICAL ENGINEERING, 2019, 3 (11) : 855 - 856
  • [3] Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation
    Quang Nguyen
    Truong Vu
    Anh Tran
    Khoi Nguyen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Weak supervision for generating pixel-level annotations in scene text segmentation
    Bonechi, Simone
    Bianchini, Monica
    Scarselli, Franco
    Andreini, Paolo
    [J]. PATTERN RECOGNITION LETTERS, 2020, 138 (138) : 1 - 7
  • [5] ArtSeg—Artifact segmentation and removal in brightfield cell microscopy images without manual pixel-level annotations
    Mohammed A. S. Ali
    Kaspar Hollo
    Tõnis Laasfeld
    Jane Torp
    Maris-Johanna Tahk
    Ago Rinken
    Kaupo Palo
    Leopold Parts
    Dmytro Fishman
    [J]. Scientific Reports, 12
  • [6] Synthesizing Images With Annotations for Medical Image Segmentation Using Diffusion Probabilistic Model
    Huang, Zengan
    Yang, Qinzhu
    Tian, Mu
    Gao, Yi
    [J]. International Journal of Imaging Systems and Technology, 2025, 35 (01)
  • [7] Combining Pixel-Level and Structure-Level Adaptation for Semantic Segmentation
    Bi, Xiwen
    Chen, Dubing
    Huang, He
    Wang, Shidong
    Zhang, Haofeng
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (07) : 9669 - 9684
  • [8] Combining Pixel-Level and Structure-Level Adaptation for Semantic Segmentation
    Xiwen Bi
    Dubing Chen
    He Huang
    Shidong Wang
    Haofeng Zhang
    [J]. Neural Processing Letters, 2023, 55 : 9669 - 9684
  • [9] Pixel-level Intra-domain Adaptation for Semantic Segmentation
    Yan, Zizheng
    Yu, Xianggang
    Qin, Yipeng
    Wu, Yushuang
    Han, Xiaoguang
    Cui, Shuguang
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 404 - 413
  • [10] Semantic Image Segmentation Using Scant Pixel Annotations
    Chakravarthy, Adithi D.
    Abeyrathna, Dilanga
    Subramaniam, Mahadevan
    Chundi, Parvathi
    Gadhamshetty, Venkataramana
    [J]. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2022, 4 (03): : 621 - 640