Text-Guided Multi-region Scene Image Editing Based on Diffusion Model

被引：0

作者：

Li, Ruichen ^{[1
]}

Wu, Lei ^{[1
]}

Wang, Changshuo ^{[1
]}

Dong, Pei ^{[1
]}

Li, Xin ^{[1
]}

机构：

[1] Shandong Univ, Jinan, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024 | 2024年 / 14872卷

关键词：

Text-guided image editing; Diffusion model; Image manipulation;

D O I：

10.1007/978-981-97-5612-4_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models have finally enabled text-guided editing realistic scene images. The latest works utilize diffusion models and most studies focus on editing individual regions based on a given text prompt. When the user delineates multiple regions, these models cannot edit in the corresponding areas based on different text semantics. Hence, we propose a new diffusion-based text-guided multi-region scene image editing model, which can handle multiple regions and corresponding text, and focus on entity-level object editing and layout-level background coordination at different denoising steps respectively. At the early steps of the denoising, we propose a mask dilation based object editing method that dilates thinner masks to ensure the accuracy of editing multiple objects. In layout-level background coordination, we not only encourage the noisy version of the original scene image to replace the random noise in the background region in the diffusion reversion process, but also propose Outward Low-pass Filtering (OutwardLPF) to eliminate the sharp transitions of noise levels between edited image regions. We conduct extensive experiments showing that our model outperforms all baselines in terms of multi-object entity editing and background coordination.

引用

页码：229 / 240

页数：12

共 50 条

[1] Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
Nam, Hyelin
Kwon, Gihyun
Park, Geon Yeong
Ye, Jong Chul
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 9192 - 9201
[2] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Nichol, Alex
Dhariwal, Prafulla
Ramesh, Aditya
Shyam, Pranav
Mishkin, Pamela
McGrew, Bob
Sutskever, Ilya
Chen, Mark
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[3] Controlling Attention Map Better for Text-Guided Image Editing Diffusion Models
Xu, Siqi
Sun, Lijun
Liu, Guanming
Wei, Zhihua
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 54 - 65
[4] FocusGAN: Preserving Background in Text-Guided Image Editing
Zhao, Liuqing
Li, Linyan
Hu, Fuyuan
Xia, Zhenping
Yao, Rui
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (16)
[5] Diffusion model-based text-guided enhancement network for medical image segmentation
Dong, Zhiwei
Yuan, Genji
Hua, Zhen
Li, Jinjiang
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[6] Text-guided image-to-sketch diffusion models☆
Ke, Aihua
Huang, Yujie
Cai, Bo
Yang, Jie
KNOWLEDGE-BASED SYSTEMS, 2024, 304
[7] Where you edit is what you get: Text-guided image editing with region-based attention
Xiao, Changming
Yang, Qi
Xu, Xiaoqiang
Zhang, Jianwei
Zhou, Feng
Zhang, Changshui
PATTERN RECOGNITION, 2023, 139
[8] Text-Guided Attention Model for Image Captioning
Mun, Jonghwan
Cho, Minsu
Han, Bohyung
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4233 - 4239
[9] Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media
Watanabe, Yuto
Togo, Ren
Maeda, Keisuke
Ogawa, Takahiro
Haseyama, Miki
SENSORS, 2024, 24 (03)
[10] Text-Guided Image Inpainting
Zhang, Zijian
Zhao, Zhou
Zhang, Zhu
Huai, Baoxing
Yuan, Jing
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4079 - 4087

← 1 2 3 4 5 →