Exploring Compositional Visual Generation with Latent Classifier Guidance

被引：3

作者：

Shi, Changhao ^{[1
]}

Ni, Haomiao ^{[2
]}

Li, Kai ^{[4
]}

Han, Shaobo ^{[4
]}

Liang, Mingfu ^{[3
]}

Min, Martin Renqiang ^{[4
]}

机构：

[1] Univ Calif San Diego, San Diego, CA 92093 USA

[2] Penn State Univ, University Pk, PA USA

[3] Northwestern Univ, Evanston, IL USA

[4] NEC Labs Amer, Princeton, NJ USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW | 2023年

关键词：

D O I：

10.1109/CVPRW59228.2023.00092

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space for compositional visual tasks. Specifically, we train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation for any pre-trained generative model with a semantic latent space. We demonstrate that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training. To maintain the original semantics during manipulation, we introduce a new guidance term, which we show is crucial for achieving compositionality. With additional assumptions, we show that the non-linear manipulation reduces to a simple latent arithmetic approach. We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images. Our findings suggest that latent classifier guidance is a promising approach that merits further exploration, even in the presence of other strong competing methods.

引用

页码：853 / 862

页数：10

共 50 条

[1] Exploring methods for the generation of visual counterfactuals in the latent space
Morales, David
Cuellar, Manuel P.
Morales, Diego P.
PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (03)
[2] End-to-End Diffusion Latent Optimization Improves Classifier Guidance
Wallace, Bram
Gokul, Akash
Ermon, Stefano
Naik, Nikhil
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7246 - 7256
[3] Retrieval-Augmented Classifier Guidance for Audio Generation
Choi, Ho-Young
Choi, Won-Gook
Chang, Joon-Hyuk
INTERSPEECH 2024, 2024, : 3310 - 3314
[4] Compositional Visual Generation with Composable Diffusion Models
Liu, Nan
Li, Shuang
Du, Yilun
Torralba, Antonio
Tenenbaum, Joshua B.
COMPUTER VISION - ECCV 2022, PT XVII, 2022, 13677 : 423 - 439
[5] Classifier Hypothesis Generation Using Visual Analysis Methods
Seifert, Christin
Sabol, Vedran
Granitzer, Michael
NETWORKED DIGITAL TECHNOLOGIES, PT 1, 2010, 87 : 98 - 111
[6] Exploring Visual Guidance in 360-degree Videos
Speicher, Marco
Rosenberg, Christoph
Degraen, Donald
Daiber, Florian
Krueger, Antonio
TVX 2019: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE EXPERIENCES FOR TV AND ONLINE VIDEO, 2019, : 1 - 12
[7] Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Sung-Bin, Kim
Senocak, Arda
Ha, Hyunwoo
Owens, Andrew
Oh, Tae-Hyun
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6430 - 6440
[8] Remote sensing target detection based on visual saliency guidance and classifier fusion
Bi, Fukun
Gao, Lining
Long, Teng
Yang, Jian
Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2011, 40 (10): : 2058 - 2064
[9] LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Feng, Weixi
Zhu, Wanrong
Fu, Tsu-jui
Jampani, Varun
Akula, Arjun
He, Xuehai
Basu, Sugato
Wang, Xin Eric
Wang, William Yang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] ConVQG: Contrastive Visual Question Generation with Multimodal Guidance
Mi, Li
Montariol, Syrielle
Castillo-Navarro, Javiera
Dai, Xianjie
Bosselut, Antoine
Tuia, Devis
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4207 - 4215

← 1 2 3 4 5 →