Exploring Compositional Visual Generation with Latent Classifier Guidance

被引：3

作者：

Shi, Changhao ^{[1
]}

Ni, Haomiao ^{[2
]}

Li, Kai ^{[4
]}

Han, Shaobo ^{[4
]}

Liang, Mingfu ^{[3
]}

Min, Martin Renqiang ^{[4
]}

机构：

[1] Univ Calif San Diego, San Diego, CA 92093 USA

[2] Penn State Univ, University Pk, PA USA

[3] Northwestern Univ, Evanston, IL USA

[4] NEC Labs Amer, Princeton, NJ USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW | 2023年

关键词：

D O I：

10.1109/CVPRW59228.2023.00092

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space for compositional visual tasks. Specifically, we train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation for any pre-trained generative model with a semantic latent space. We demonstrate that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training. To maintain the original semantics during manipulation, we introduce a new guidance term, which we show is crucial for achieving compositionality. With additional assumptions, we show that the non-linear manipulation reduces to a simple latent arithmetic approach. We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images. Our findings suggest that latent classifier guidance is a promising approach that merits further exploration, even in the presence of other strong competing methods.

引用

页码：853 / 862

页数：10

共 50 条

[31] THE VISUAL GUIDANCE OF CATCHING
SAVELSBERGH, GJP
WHITING, HTA
PIJPERS, JR
VANSANTVOORD, AAM
EXPERIMENTAL BRAIN RESEARCH, 1993, 93 (01) : 148 - 156
[32] Diversity and Consistency: Exploring Visual Question-Answer Pair Generation
Yang, Sen
Zhou, Qingyu
Feng, Dawei
Liu, Yang
Li, Chao
Cao, Yunbo
Li, Dongsheng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1053 - 1066
[33] VISUAL GUIDANCE OF LOCOMOTION
LLEWELLYN, KR
JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1971, 91 (02): : 245 - +
[34] GUIDANCE AND VISUAL TRAINING
MARTIN, L
EDUCATION, 1962, 82 (07): : 419 - 421
[35] Prompt Conditioned Batik Pattern Generation Using LoRA Weighted Diffusion Model With Classifier-Free Guidance
Daffa Izzuddin Wahid, Rahmatulloh
Yudistira, Novanto
Dewi, Candra
Nurmala Sari, Irawati
Pradhikta, Dyanningrum
Fatmawati
IEEE ACCESS, 2025, 13 : 2436 - 2448
[36] Approaches of Individual Classifier Generation and Classifier Set Selection for Fuzzy Classifier Ensemble
Yang, Ai-Min
Yang, Yue-Xiang
Jiang, Sheng-Yi
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2008, : 519 - +
[37] Visual Programming: Compositional visual reasoning without training
Gupta, Tanmay
Kembhavi, Aniruddha
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14953 - 14962
[38] Compositional Substitutivity of Visual Reasoning for Visual Question Answering
Li, Chuanhao
Li, Zhen
Jing, Chenchen
Wu, Yuwei
Zhai, Mingliang
Jia, Yunde
COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 143 - 160
[39] Exploring Facial Expressions with Compositional Features
Yang, Peng
Liu, Qingshan
Metaxas, Dimitris N.
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 2638 - 2644
[40] Compositional Transformers for Scene Generation
Hudson, Drew A.
Zitnick, C. Lawrence
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →