Exploring Compositional Visual Generation with Latent Classifier Guidance

被引:3
|
作者
Shi, Changhao [1 ]
Ni, Haomiao [2 ]
Li, Kai [4 ]
Han, Shaobo [4 ]
Liang, Mingfu [3 ]
Min, Martin Renqiang [4 ]
机构
[1] Univ Calif San Diego, San Diego, CA 92093 USA
[2] Penn State Univ, University Pk, PA USA
[3] Northwestern Univ, Evanston, IL USA
[4] NEC Labs Amer, Princeton, NJ USA
关键词
D O I
10.1109/CVPRW59228.2023.00092
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space for compositional visual tasks. Specifically, we train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation for any pre-trained generative model with a semantic latent space. We demonstrate that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training. To maintain the original semantics during manipulation, we introduce a new guidance term, which we show is crucial for achieving compositionality. With additional assumptions, we show that the non-linear manipulation reduces to a simple latent arithmetic approach. We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images. Our findings suggest that latent classifier guidance is a promising approach that merits further exploration, even in the presence of other strong competing methods.
引用
收藏
页码:853 / 862
页数:10
相关论文
共 50 条
  • [1] Exploring methods for the generation of visual counterfactuals in the latent space
    Morales, David
    Cuellar, Manuel P.
    Morales, Diego P.
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (03)
  • [2] End-to-End Diffusion Latent Optimization Improves Classifier Guidance
    Wallace, Bram
    Gokul, Akash
    Ermon, Stefano
    Naik, Nikhil
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7246 - 7256
  • [3] Retrieval-Augmented Classifier Guidance for Audio Generation
    Choi, Ho-Young
    Choi, Won-Gook
    Chang, Joon-Hyuk
    INTERSPEECH 2024, 2024, : 3310 - 3314
  • [4] Compositional Visual Generation with Composable Diffusion Models
    Liu, Nan
    Li, Shuang
    Du, Yilun
    Torralba, Antonio
    Tenenbaum, Joshua B.
    COMPUTER VISION - ECCV 2022, PT XVII, 2022, 13677 : 423 - 439
  • [5] Classifier Hypothesis Generation Using Visual Analysis Methods
    Seifert, Christin
    Sabol, Vedran
    Granitzer, Michael
    NETWORKED DIGITAL TECHNOLOGIES, PT 1, 2010, 87 : 98 - 111
  • [6] Exploring Visual Guidance in 360-degree Videos
    Speicher, Marco
    Rosenberg, Christoph
    Degraen, Donald
    Daiber, Florian
    Krueger, Antonio
    TVX 2019: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE EXPERIENCES FOR TV AND ONLINE VIDEO, 2019, : 1 - 12
  • [7] Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
    Sung-Bin, Kim
    Senocak, Arda
    Ha, Hyunwoo
    Owens, Andrew
    Oh, Tae-Hyun
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6430 - 6440
  • [8] Remote sensing target detection based on visual saliency guidance and classifier fusion
    Bi, Fukun
    Gao, Lining
    Long, Teng
    Yang, Jian
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2011, 40 (10): : 2058 - 2064
  • [9] LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
    Feng, Weixi
    Zhu, Wanrong
    Fu, Tsu-jui
    Jampani, Varun
    Akula, Arjun
    He, Xuehai
    Basu, Sugato
    Wang, Xin Eric
    Wang, William Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] ConVQG: Contrastive Visual Question Generation with Multimodal Guidance
    Mi, Li
    Montariol, Syrielle
    Castillo-Navarro, Javiera
    Dai, Xianjie
    Bosselut, Antoine
    Tuia, Devis
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4207 - 4215