Compositional Transformers for Scene Generation

被引：0

作者：

Hudson, Drew A. ^{[1
]}

Zitnick, C. Lawrence ^{[2
]}

机构：

[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

[2] Facebook Inc, Facebook AI Res, Menlo Pk, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature of visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight planning phase, where we draft a high-level scene layout, followed by an attention-based execution phase, where the layout is being refined, evolving into a rich and detailed picture. Our model moves away from conventional black-box GAN architectures that feature a flat and monolithic latent space towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate GANformer2's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. Further experiments demonstrate the model's disentanglement and provide a deeper insight into its generative process, as it proceeds step-by-step from a rough initial sketch, to a detailed layout that accounts for objects' depths and dependencies, and up to the final high-resolution depiction of vibrant and intricate real-world scenes. See https://github.com/ dorarad/gansformer for model implementation.

引用

页数：15

共 50 条

[1] SceneFormer: Indoor Scene Generation with Transformers
Wang, Xinpeng
Yeshwanth, Chandan
Niesner, Matthias
2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, : 106 - 115
[2] Relation Detection with Transformers for Panoptic Scene Graph Generation
Liu, Chang
Yan, Wenchao
Chen, Shilin
Huang, Liqun
Huang, Xiaotao
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 223 - 238
[3] Composite Relationship Fields with Transformers for Scene Graph Generation
Adaimi, George
Mizrahi, David
Alahi, Alexandre
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 52 - 64
[4] Compositional Feature Augmentation for Unbiased Scene Graph Generation
Li, Lin
Chen, Guikun
Xiao, Jun
Yang, Yi
Wang, Chunping
Chen, Long
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21628 - 21638
[5] IS-GGT: Iterative Scene Graph Generation with Generative Transformers
Kundu, Sanjoy
Aakur, Sathyanarayanan N.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6292 - 6301
[6] Compositional 3D Scene Generation using Locally Conditioned Diffusion
Po, Ryan
Wetzstein, Gordon
2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 651 - 663
[7] Context-aware Scene Graph Generation with Seq2Seq Transformers
Lu, Yichao
Rai, Himanshu
Chang, Jason
Knyazev, Boris
Yu, Guangwei
Shekhar, Shashank
Taylor, Graham W.
Volkovs, Maksims
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15911 - 15921
[8] Making Transformers Solve Compositional Tasks
Ontanon, Santiago
Ainslie, Joshua
Cvicek, Vaclav
Fisher, Zachary
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3591 - 3607
[9] State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation
He, Tao
Gao, Lianli
Song, Jingkuan
Li, Yuan-Fang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 43 - 56
[10] Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
Bitton, Yonatan
Stanovsky, Gabriel
Schwartz, Roy
Elhadad, Michael
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 94 - 123

← 1 2 3 4 5 →