Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

被引:0
|
作者
Li, Ruichen [1 ]
Wu, Lei [1 ]
Wang, Changshuo [1 ]
Dong, Pei [1 ]
Li, Xin [1 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
关键词
Text-guided image editing; Diffusion model; Image manipulation;
D O I
10.1007/978-981-97-5612-4_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models have finally enabled text-guided editing realistic scene images. The latest works utilize diffusion models and most studies focus on editing individual regions based on a given text prompt. When the user delineates multiple regions, these models cannot edit in the corresponding areas based on different text semantics. Hence, we propose a new diffusion-based text-guided multi-region scene image editing model, which can handle multiple regions and corresponding text, and focus on entity-level object editing and layout-level background coordination at different denoising steps respectively. At the early steps of the denoising, we propose a mask dilation based object editing method that dilates thinner masks to ensure the accuracy of editing multiple objects. In layout-level background coordination, we not only encourage the noisy version of the original scene image to replace the random noise in the background region in the diffusion reversion process, but also propose Outward Low-pass Filtering (OutwardLPF) to eliminate the sharp transitions of noise levels between edited image regions. We conduct extensive experiments showing that our model outperforms all baselines in terms of multi-object entity editing and background coordination.
引用
收藏
页码:229 / 240
页数:12
相关论文
共 50 条
  • [21] Text-guided small molecule generation via diffusion model
    Luo, Yanchen
    Fang, Junfeng
    Li, Sihang
    Liu, Zhiyuan
    Wu, Jiancan
    Zhang, An
    Du, Wenjie
    Wang, Xiang
    ISCIENCE, 2024, 27 (11)
  • [22] Text-Guided Foundation Model Adaptation for Pathological Image Classification
    Zhang, Yunkun
    Gao, Jin
    Zhou, Mu
    Wang, Xiaosong
    Qiao, Yu
    Zhang, Shaoting
    Wang, Dequan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT V, 2023, 14224 : 272 - 282
  • [23] A TEXT-GUIDED GRAPH STRUCTURE FOR IMAGE CAPTIONING
    Wang, Depeng
    Hu, Zhenzhen
    Zhou, Yuanen
    Liu, Xueliang
    Wu, Le
    Hong, Richang
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,
  • [24] Perceptual Image Compression with Text-Guided Multi-level Fusion
    Hu, Jiaqi
    Zhuang, Jiedong
    Liang, Xiaoyu
    Wang, Dayong
    Yu, Lu
    Hu, Haoji
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 84 - 97
  • [25] LayerDiff: Exploring Text-Guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
    Huang, Runhui
    Cai, Kaixin
    Hang, Jianhua
    Liang, Xiaodan
    Pei, Renjing
    Lu, Guansong
    Xu, Songcen
    Zhang, Wei
    Xu, Hang
    COMPUTER VISION - ECCV 2024, PT LXXVI, 2025, 15134 : 144 - 160
  • [26] Bimodal text-guided image inpainting algorithm
    Li H.
    Chen J.
    Yu P.
    Li H.
    Zhang Y.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2023, 49 (10): : 2547 - 2557
  • [27] TIC: text-guided image colorization using conditional generative model
    Ghosh, Subhankar
    Roy, Prasun
    Bhattacharya, Saumik
    Pal, Umapada
    Blumenstein, Michael
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 41121 - 41136
  • [28] TIC: text-guided image colorization using conditional generative model
    Subhankar Ghosh
    Prasun Roy
    Saumik Bhattacharya
    Umapada Pal
    Michael Blumenstein
    Multimedia Tools and Applications, 2024, 83 : 41121 - 41136
  • [29] MISL: Multi-grained image-text semantic learning for text-guided image inpainting
    Wu, Xingcai
    Zhao, Kejun
    Huang, Qianding
    Wang, Qi
    Yang, Zhenguo
    Hao, Gefei
    PATTERN RECOGNITION, 2024, 145
  • [30] Multi-Region Text-Driven Manipulation of Diffusion Imagery
    Li, Yiming
    Zhou, Peng
    Sun, Jun
    Xu, Yi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3261 - 3269