CLIP-Flow: Decoding images encoded in CLIP space

被引:0
|
作者
Ma, Hao [1 ]
Li, Ming [1 ]
Yang, Jingyuan [1 ]
Patashnik, Or [2 ]
Lischinski, Dani [3 ]
Cohen-Or, Daniel [1 ,2 ]
Huang, Hui [1 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Visual Comp Res Ctr, Shenzhen 518060, Peoples R China
[2] Tel Aviv Univ, Dept Comp Sci, IL-6997801 Tel Aviv, Israel
[3] Hebrew Univ Jerusalem, Sch Comp Sci & Engn, IL-91904 Jerusalem, Israel
基金
中国国家自然科学基金; 以色列科学基金会;
关键词
image-to-image; text-to-image; contrastive language-image pretraining (CLIP); flow; StyleGAN;
D O I
10.1007/s41095-023-0375-z
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This study introduces CLIP-Flow, a novel network for generating images from a given image or text. To effectively utilize the rich semantics contained in both modalities, we designed a semantics-guided methodology for image- and text-to-image synthesis. In particular, we adopted Contrastive Language-Image Pretraining (CLIP) as an encoder to extract semantics and StyleGAN as a decoder to generate images from such information. Moreover, to bridge the embedding space of CLIP and latent space of StyleGAN, real NVP is employed and modified with activation normalization and invertible convolution. As the images and text in CLIP share the same representation space, text prompts can be fed directly into CLIP-Flow to achieve text-to-image synthesis. We conducted extensive experiments on several datasets to validate the effectiveness of the proposed image-to-image synthesis method. In addition, we tested on the public dataset Multi-Modal CelebA-HQ, for text-to-image synthesis. Experiments validated that our approach can generate high-quality text-matching images, and is comparable with state-of-the-art methods, both qualitatively and quantitatively.
引用
收藏
页码:1157 / 1168
页数:12
相关论文
共 50 条
  • [1] AD- CLIP: Adapting Domains in Prompt Space Using CLIP
    Singha, Mainak
    Pal, Harsh
    Jha, Ankit
    Banerjee, Biplab
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4357 - 4366
  • [2] Exploring CLIP for Assessing the Look and Feel of Images
    Wang, Jianyi
    Chan, Kelvin C. K.
    Loy, Chen Change
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2555 - 2563
  • [3] Migration of an aneurysm clip to the sacral subarachnoid space
    Yong Hwy Kim
    Jeong Eun Kim
    Hyun-Seung Kang
    Dae Hee Han
    Acta Neurochirurgica, 2009, 151 : 699 - 700
  • [4] Migration of an aneurysm clip to the sacral subarachnoid space
    Kim, Yong Hwy
    Kim, Jeong Eun
    Kang, Hyun-Seung
    Han, Dae Hee
    ACTA NEUROCHIRURGICA, 2009, 151 (06) : 699 - 700
  • [5] Contrast enhancement of portal images with adaptive histogram clip
    Gluhchev, G
    Shalev, S
    SCIA '97 - PROCEEDINGS OF THE 10TH SCANDINAVIAN CONFERENCE ON IMAGE ANALYSIS, VOLS 1 AND 2, 1997, : 531 - 531
  • [6] Contrast enhancement of portal images with adaptive histogram clip
    Gluhchev, Georgi
    Turkish Journal of Electrical Engineering & Computer Sciences, 5 (01): : 139 - 145
  • [7] Documenting Clinical and Laboratory Images in Publications The CLIP Principles
    Lang, Thomas A.
    Talerico, Cassandra
    Siontis, George C. M.
    CHEST, 2012, 141 (06) : 1626 - 1632
  • [8] DACS BOOST PERFORMANCE, CLIP POWER AND SPACE DEMANDS
    DONLIN, M
    COMPUTER DESIGN, 1989, 28 (02): : 51 - 53
  • [9] Delving into CLIP latent space for Video Anomaly Recognition
    Zanella, Luca
    Liberatori, Benedetta
    Menapace, Willi
    Poiesi, Fabio
    Wang, Yiming
    Ricci, Elisa
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [10] MotionCLIP: Exposing Human Motion Generation to CLIP Space
    Tevet, Guy
    Gordon, Brian
    Hertz, Amir
    Bermano, Amit H.
    Cohen-Or, Daniel
    COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 358 - 374