Exploring Compositional Visual Generation with Latent Classifier Guidance

被引:3
|
作者
Shi, Changhao [1 ]
Ni, Haomiao [2 ]
Li, Kai [4 ]
Han, Shaobo [4 ]
Liang, Mingfu [3 ]
Min, Martin Renqiang [4 ]
机构
[1] Univ Calif San Diego, San Diego, CA 92093 USA
[2] Penn State Univ, University Pk, PA USA
[3] Northwestern Univ, Evanston, IL USA
[4] NEC Labs Amer, Princeton, NJ USA
关键词
D O I
10.1109/CVPRW59228.2023.00092
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space for compositional visual tasks. Specifically, we train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation for any pre-trained generative model with a semantic latent space. We demonstrate that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training. To maintain the original semantics during manipulation, we introduce a new guidance term, which we show is crucial for achieving compositionality. With additional assumptions, we show that the non-linear manipulation reduces to a simple latent arithmetic approach. We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images. Our findings suggest that latent classifier guidance is a promising approach that merits further exploration, even in the presence of other strong competing methods.
引用
收藏
页码:853 / 862
页数:10
相关论文
共 50 条
  • [21] Ninja Codes: Exploring Neural Generation of Discreet Visual Codes
    Takeuchi, Yuichiro
    EXTENDED ABSTRACTS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'21), 2021,
  • [22] Exploring the Heterogeneity of Cultural Landscape Preferences: A Visual-Based Latent Class Approach
    Arnberger, Arne
    Eder, Renate
    LANDSCAPE RESEARCH, 2011, 36 (01) : 19 - 40
  • [23] DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
    Wang, Chenyang
    Zheng, Zerong
    Yu, Tao
    Lv, Xiaoqian
    Zhong, Bineng
    Zhang, Shengping
    Nie, Liqiang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6169 - 6179
  • [24] GANcMRI: Cardiac magnetic resonance video generation and physiologic guidance using latent space prompting
    Vukadinovic, Milos
    Kwan, Alan C.
    Li, Debiao
    Ouyang, David
    MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 594 - 606
  • [25] DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences, Neuron Visualisations, and Visual Counterfactual Explanations
    Augustin, Maximilian
    Neuhaus, Yannic
    Hein, Matthias
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 11093 - 11103
  • [26] A latent Gaussian model for compositional data with zeros
    Butler, Adam
    Glasbey, Chris
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2008, 57 : 505 - 520
  • [27] Visual concepts and compositional voting
    Wang, Jianyu
    Zhang, Zhishuai
    Xie, Cihang
    Zhou, Yuyin
    Premachandran, Vittal
    Zhu, Jun
    Xie, Lingxi
    Yuille, Alan
    ANNALS OF MATHEMATICAL SCIENCES AND APPLICATIONS, 2018, 3 (01) : 151 - 188
  • [28] A Benchmark for Compositional Visual Reasoning
    Zerroug, Aimen
    Vaishnav, Mohit
    Colin, Julien
    Musslick, Sebastian
    Serre, Thomas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [29] Mixture of latent multinomial naive Bayes classifier
    Harzevili, Nima Shiri
    Alizadeh, Sasan H.
    APPLIED SOFT COMPUTING, 2018, 69 : 516 - 527
  • [30] An L∞ Norm Visual Classifier
    Anand, Anushka
    Wilkinson, Leland
    Dang Nhon Tuan
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 687 - +