Compositional Transformers for Scene Generation

被引:0
|
作者
Hudson, Drew A. [1 ]
Zitnick, C. Lawrence [2 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Facebook Inc, Facebook AI Res, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature of visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight planning phase, where we draft a high-level scene layout, followed by an attention-based execution phase, where the layout is being refined, evolving into a rich and detailed picture. Our model moves away from conventional black-box GAN architectures that feature a flat and monolithic latent space towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate GANformer2's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. Further experiments demonstrate the model's disentanglement and provide a deeper insight into its generative process, as it proceeds step-by-step from a rough initial sketch, to a detailed layout that accounts for objects' depths and dependencies, and up to the final high-resolution depiction of vibrant and intricate real-world scenes. See https://github.com/ dorarad/gansformer for model implementation.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] SceneFormer: Indoor Scene Generation with Transformers
    Wang, Xinpeng
    Yeshwanth, Chandan
    Niesner, Matthias
    2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, : 106 - 115
  • [2] Relation Detection with Transformers for Panoptic Scene Graph Generation
    Liu, Chang
    Yan, Wenchao
    Chen, Shilin
    Huang, Liqun
    Huang, Xiaotao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 223 - 238
  • [3] Composite Relationship Fields with Transformers for Scene Graph Generation
    Adaimi, George
    Mizrahi, David
    Alahi, Alexandre
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 52 - 64
  • [4] Compositional Feature Augmentation for Unbiased Scene Graph Generation
    Li, Lin
    Chen, Guikun
    Xiao, Jun
    Yang, Yi
    Wang, Chunping
    Chen, Long
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21628 - 21638
  • [5] IS-GGT: Iterative Scene Graph Generation with Generative Transformers
    Kundu, Sanjoy
    Aakur, Sathyanarayanan N.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6292 - 6301
  • [6] Compositional 3D Scene Generation using Locally Conditioned Diffusion
    Po, Ryan
    Wetzstein, Gordon
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 651 - 663
  • [7] Context-aware Scene Graph Generation with Seq2Seq Transformers
    Lu, Yichao
    Rai, Himanshu
    Chang, Jason
    Knyazev, Boris
    Yu, Guangwei
    Shekhar, Shashank
    Taylor, Graham W.
    Volkovs, Maksims
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15911 - 15921
  • [8] Making Transformers Solve Compositional Tasks
    Ontanon, Santiago
    Ainslie, Joshua
    Cvicek, Vaclav
    Fisher, Zachary
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3591 - 3607
  • [9] State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation
    He, Tao
    Gao, Lianli
    Song, Jingkuan
    Li, Yuan-Fang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 43 - 56
  • [10] Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
    Bitton, Yonatan
    Stanovsky, Gabriel
    Schwartz, Roy
    Elhadad, Michael
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 94 - 123