Where you edit is what you get: Text-guided image editing with region-based attention

被引:7
|
作者
Xiao, Changming [1 ,2 ,3 ]
Yang, Qi [1 ,2 ,3 ]
Xu, Xiaoqiang [4 ]
Zhang, Jianwei [5 ]
Zhou, Feng [4 ]
Zhang, Changshui [1 ,2 ,3 ]
机构
[1] Tsinghua Univ THUAI, Inst ArtificialIntelligence, Beijing 100084, Peoples R China
[2] Beijing Natl Res Ctr Informat Sci & Technol BNRist, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[4] Aibee Inc, Algorithm Res, Beijing, Peoples R China
[5] Univ Hamburg, Inst Tech Aspects Multimodal Syst TAMS, Dept Informat, Hamburg, Germany
关键词
Generative adversarial networks; Text -guided image editing; Spatial disentanglement; GENERATION;
D O I
10.1016/j.patcog.2023.109458
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Leveraging the abundant knowledge learned from pre-trained multi-modal models like CLIP has recently proved to be effective for text-guided image editing. Though convincing results have been made when combining the image generator StyleGAN with CLIP, most methods need to train separate models for different prompts, and irrelevant regions are often changed after editing due to the lack of spatial disen-tanglement. We propose a novel framework that can edit different images according to different prompts in one model. Besides, an innovative region-based spatial attention mechanism is adopted to explicitly guarantee the locality of editing. Experiments mainly in the face domain verify the feasibility of our framework and show that when multi-text editing and local editing are accomplishable, our method can complete practical applications like sequential editing and regional style transfer.(c) 2023 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页数:12
相关论文
共 8 条
  • [1] Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media
    Watanabe, Yuto
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    SENSORS, 2024, 24 (03)
  • [2] Controlling Attention Map Better for Text-Guided Image Editing Diffusion Models
    Xu, Siqi
    Sun, Lijun
    Liu, Guanming
    Wei, Zhihua
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 54 - 65
  • [3] Image-based cell sorting: what you see is what you get
    Cassiday, Laura
    ANALYTICAL CHEMISTRY, 2008, 80 (01) : 11 - 11
  • [4] Validation of fluoroscopy based navigation in the hip region. What you see is what you get?
    Schep, NWL
    van Walsum, T
    de Graaf, JS
    Broeders, IAMJ
    van der Werken, C
    CARS 2002: COMPUTER ASSISTED RADIOLOGY AND SURGERY, PROCEEDINGS, 2002, : 247 - 251
  • [5] Image-Based Ground Visibility for Aviation: Is What You See What You Get? (Pilot Study)
    Kratchounova, Daniela
    Newton, David C.
    Hood, Robbie
    VIRTUAL, AUGMENTED AND MIXED REALITY: APPLICATIONS AND CASE STUDIES, VAMR 2019, PT II, 2019, 11575 : 516 - 528
  • [6] Text-guided floral image generation based on lightweight deep attention feature fusion GAN
    Yang, Wenji
    An, Hang
    Hu, Wenchao
    Ma, Xinxin
    Xie, Liping
    VISUAL COMPUTER, 2024, : 3519 - 3535
  • [7] Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts
    Fu, Kun
    Jin, Junqi
    Cui, Runpeng
    Sha, Fei
    Zhang, Changshui
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2321 - 2334
  • [8] Watch What You Just Said: Image Captioning with Text-Conditional Attention
    Zhou, Luowei
    Xu, Chenliang
    Koch, Parker
    Corso, Jason J.
    PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 305 - 313