Multi-Region Text-Driven Manipulation of Diffusion Imagery

被引:0
|
作者
Li, Yiming [1 ,2 ]
Zhou, Peng [3 ]
Sun, Jun [1 ]
Xu, Yi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai Key Lab Digital Media Proc & Transmiss, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, AI Inst, MoE, Key Lab Artificial Intelligence, Shanghai, Peoples R China
[3] China Mobile Suzhou Software Technol Co Ltd, Suzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-guided image manipulation has attracted significant attention recently. Prevailing techniques concentrate on image attribute editing for individual objects, however, encountering challenges when it comes to multi-object editing. The main reason is the lack of consistency constraints on the spatial layout. This work presents a multi-region guided image manipulation framework, enabling manipulation through region-level textual prompts. With MultiDiffusion as a baseline, we are dedicated to the automatic generation of a rational multi-object spatial distribution, where disparate regions are fused as a unified entity. To mitigate interference from regional fusion, we employ an off-the-shelf model (CLIP) to impose region-aware spatial guidance on multi-object manipulation. Moreover, when applied to the StableDiffusion, the presence of quality-related yet object-agnostic lengthy words hampers the manipulation. To ensure focus on meaningful object-specific words for efficient guidance and generation, we introduce a keyword selection method. Furthermore, we demonstrate a downstream application of our method for multi-region inversion, which is tailored for manipulating multiple objects in real images. Our approach, compatible with variants of Stable Diffusion models, is readily applicable for manipulating diverse objects in extensive images with high-quality generation, showing superb image control capabilities. Code is available at https://github.com/liyiming09/multi-region-guided-diffusion.
引用
收藏
页码:3261 / 3269
页数:9
相关论文
共 50 条
  • [1] StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
    Patashnik, Or
    Wu, Zongze
    Shechtman, Eli
    Cohen-Or, Daniel
    Lischinski, Dani
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2065 - 2074
  • [2] TmfimCLIP: Text-Driven Multi-Attribute Face Image Manipulation
    Yaermaimaiti, Yilihamu
    Wang, Ruohao
    Lou, Xudong
    Liu, Yajie
    Xi, Linfei
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2024,
  • [3] Multi-channel correlated diffusion for text-driven artistic style transfer
    Jiang, Guoquan
    Wang, Canyu
    Huo, Zhanqiang
    Xu, Huan
    VISUAL COMPUTER, 2025,
  • [4] DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
    Lyu, Yueming
    Lin, Tianwei
    Li, Fu
    He, Dongliang
    Dong, Jing
    Tan, Tieniu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6894 - 6903
  • [5] DIFFERENTIAL EFFECTS OF PICTURE- AND TEXT-DRIVEN EMOTIONAL IMAGERY
    Limberg, Anke
    Thees, Monique
    Weber, Carolin
    Hamm, Alfons O.
    Wendt, Julia
    PSYCHOPHYSIOLOGY, 2013, 50 : S94 - S95
  • [6] Blended Diffusion for Text-driven Editing of Natural Images
    Avrahami, Omri
    Lischinski, Dani
    Fried, Ohad
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18187 - 18197
  • [7] Text-driven Face Image Generation and Manipulation via Multi-level Residual Mapper
    Li Z.-L.
    Zhang S.-P.
    Liu Y.
    Zhang Z.-X.
    Zhang W.-G.
    Huang Q.-M.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2101 - 2115
  • [8] MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model
    Zhang, Mingyuan
    Cai, Zhongang
    Pan, Liang
    Hong, Fangzhou
    Guo, Xinying
    Yang, Lei
    Liu, Ziwei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) : 4115 - 4128
  • [9] TexFit: Text-Driven Fashion Image Editing with Diffusion Models
    Wang, Tongxin
    Ye, Mang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10198 - 10206
  • [10] DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization
    Huang, Nisha
    Zhang, Yuxin
    Tang, Fan
    Ma, Chongyang
    Huang, Haibin
    Dong, Weiming
    Xu, Changsheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (02) : 3370 - 3383