Hierarchical Image Generation via Transformer-Based Sequential Patch Selection

被引:0
|
作者
Xu, Xiaogang [1 ]
Xu, Ning [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Adobe Res, San Jose, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To synthesize images with preferred objects and interactions, a controllable way is to generate the image from a scene graph and a large pool of object crops, where the spatial arrangements of the objects in the image are defined by the scene graph while their appearances are determined by the retrieved crops from the pool. In this paper, we propose a novel framework with such a semi-parametric generation strategy. First, to encourage the retrieval of mutually compatible crops, we design a sequential selection strategy where the crop selection for each object is determined by the contents and locations of all object crops that have been chosen previously. Such process is implemented via a transformer trained with contrastive losses. Second, to generate the final image, our hierarchical generation strategy leverages hierarchical gated convolutions which are employed to synthesize areas not covered by any image crops, and a patch-guided spatially adaptive normalization module which is proposed to guarantee the final generated images complying with the crop appearance and the scene graph. Evaluated on the challenging Visual Genome and COCO-Stuff dataset, our experimental results demonstrate the superiority of our proposed method over existing state-of-the-art methods.
引用
收藏
页码:2938 / 2945
页数:8
相关论文
共 50 条
  • [31] Transformer-based partner dance motion generation
    Wu, Ying
    Wu, Zizhao
    Ji, Chengtao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139
  • [32] Transformer-based Point Cloud Generation Network
    Xu, Rui
    Hui, Le
    Han, Yuehui
    Qian, Jianjun
    Xie, Jin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4169 - 4177
  • [33] A Review of Transformer-Based Approaches for Image Captioning
    Ondeng, Oscar
    Ouma, Heywood
    Akuon, Peter
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [34] Progressive Transformer-Based Generation of Radiology Reports
    Nooralahzadeh, Farhad
    Gonzalez, Nicolas Perez
    Frauenfelder, Thomas
    Fujimoto, Koji
    Krauthammer, Michael
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2824 - 2832
  • [35] Transformer-based Natural Language Understanding and Generation
    Zhang, Feng
    An, Gaoyun
    Ruan, Qiuqi
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 281 - 284
  • [36] HIPA: Hierarchical Patch Transformer for Single Image Super Resolution
    Cai, Qing
    Qian, Yiming
    Li, Jinxing
    Lyu, Jun
    Yang, Yee-Hong
    Wu, Feng
    Zhang, David
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3226 - 3237
  • [37] A Transformer-Based Hierarchical Variational AutoEncoder Combined Hidden Markov Model for Long Text Generation
    Zhao, Kun
    Ding, Hongwei
    Ye, Kai
    Cui, Xiaohui
    ENTROPY, 2021, 23 (10)
  • [38] PlaceFormer: Transformer-Based Visual Place Recognition Using Multi-Scale Patch Selection and Fusion
    Kannan, Shyam Sundar
    Min, Byung-Cheol
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6552 - 6559
  • [39] Transformer-based Image Compression with Variable Image Quality Objectives
    Kao, Chia-Hao
    Chen, Yi-Hsin
    Chien, Cheng
    Chiu, Wei-Chen
    Peng, Wen-Hsiao
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1718 - 1725
  • [40] Tjong: A transformer-based Mahjong AI via hierarchical decision-making and fan backward
    Li, Xiali
    Liu, Bo
    Wei, Zhi
    Wang, Zhaoqi
    Wu, Licheng
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (04) : 982 - 995