Exploring Compositional Visual Generation with Latent Classifier Guidance

被引:3
|
作者
Shi, Changhao [1 ]
Ni, Haomiao [2 ]
Li, Kai [4 ]
Han, Shaobo [4 ]
Liang, Mingfu [3 ]
Min, Martin Renqiang [4 ]
机构
[1] Univ Calif San Diego, San Diego, CA 92093 USA
[2] Penn State Univ, University Pk, PA USA
[3] Northwestern Univ, Evanston, IL USA
[4] NEC Labs Amer, Princeton, NJ USA
关键词
D O I
10.1109/CVPRW59228.2023.00092
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space for compositional visual tasks. Specifically, we train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation for any pre-trained generative model with a semantic latent space. We demonstrate that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training. To maintain the original semantics during manipulation, we introduce a new guidance term, which we show is crucial for achieving compositionality. With additional assumptions, we show that the non-linear manipulation reduces to a simple latent arithmetic approach. We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images. Our findings suggest that latent classifier guidance is a promising approach that merits further exploration, even in the presence of other strong competing methods.
引用
收藏
页码:853 / 862
页数:10
相关论文
共 50 条
  • [31] THE VISUAL GUIDANCE OF CATCHING
    SAVELSBERGH, GJP
    WHITING, HTA
    PIJPERS, JR
    VANSANTVOORD, AAM
    EXPERIMENTAL BRAIN RESEARCH, 1993, 93 (01) : 148 - 156
  • [32] Diversity and Consistency: Exploring Visual Question-Answer Pair Generation
    Yang, Sen
    Zhou, Qingyu
    Feng, Dawei
    Liu, Yang
    Li, Chao
    Cao, Yunbo
    Li, Dongsheng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1053 - 1066
  • [33] VISUAL GUIDANCE OF LOCOMOTION
    LLEWELLYN, KR
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1971, 91 (02): : 245 - +
  • [34] GUIDANCE AND VISUAL TRAINING
    MARTIN, L
    EDUCATION, 1962, 82 (07): : 419 - 421
  • [35] Prompt Conditioned Batik Pattern Generation Using LoRA Weighted Diffusion Model With Classifier-Free Guidance
    Daffa Izzuddin Wahid, Rahmatulloh
    Yudistira, Novanto
    Dewi, Candra
    Nurmala Sari, Irawati
    Pradhikta, Dyanningrum
    Fatmawati
    IEEE ACCESS, 2025, 13 : 2436 - 2448
  • [36] Approaches of Individual Classifier Generation and Classifier Set Selection for Fuzzy Classifier Ensemble
    Yang, Ai-Min
    Yang, Yue-Xiang
    Jiang, Sheng-Yi
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2008, : 519 - +
  • [37] Visual Programming: Compositional visual reasoning without training
    Gupta, Tanmay
    Kembhavi, Aniruddha
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14953 - 14962
  • [38] Compositional Substitutivity of Visual Reasoning for Visual Question Answering
    Li, Chuanhao
    Li, Zhen
    Jing, Chenchen
    Wu, Yuwei
    Zhai, Mingliang
    Jia, Yunde
    COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 143 - 160
  • [39] Exploring Facial Expressions with Compositional Features
    Yang, Peng
    Liu, Qingshan
    Metaxas, Dimitris N.
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 2638 - 2644
  • [40] Compositional Transformers for Scene Generation
    Hudson, Drew A.
    Zitnick, C. Lawrence
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34