Hierarchical Image Generation via Transformer-Based Sequential Patch Selection

被引:0
|
作者
Xu, Xiaogang [1 ]
Xu, Ning [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Adobe Res, San Jose, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To synthesize images with preferred objects and interactions, a controllable way is to generate the image from a scene graph and a large pool of object crops, where the spatial arrangements of the objects in the image are defined by the scene graph while their appearances are determined by the retrieved crops from the pool. In this paper, we propose a novel framework with such a semi-parametric generation strategy. First, to encourage the retrieval of mutually compatible crops, we design a sequential selection strategy where the crop selection for each object is determined by the contents and locations of all object crops that have been chosen previously. Such process is implemented via a transformer trained with contrastive losses. Second, to generate the final image, our hierarchical generation strategy leverages hierarchical gated convolutions which are employed to synthesize areas not covered by any image crops, and a patch-guided spatially adaptive normalization module which is proposed to guarantee the final generated images complying with the crop appearance and the scene graph. Evaluated on the challenging Visual Genome and COCO-Stuff dataset, our experimental results demonstrate the superiority of our proposed method over existing state-of-the-art methods.
引用
收藏
页码:2938 / 2945
页数:8
相关论文
共 50 条
  • [1] From Patch to Pixel: A Transformer-Based Hierarchical Framework for Compressive Image Sensing
    Gan, Hongping
    Shen, Minghe
    Hua, Yi
    Ma, Chunyan
    Zhang, Tao
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2023, 9 : 133 - 146
  • [2] Efficient Transformer-Based Compressed Video Modeling via Informative Patch Selection
    Suzuki, Tomoyuki
    Aoki, Yoshimitsu
    SENSORS, 2023, 23 (01)
  • [3] A transformer-based Urdu image caption generation
    Hadi M.
    Safder I.
    Waheed H.
    Zaman F.
    Aljohani N.R.
    Nawaz R.
    Hassan S.U.
    Sarwar R.
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (9) : 3441 - 3457
  • [4] Transformer-based image generation from scene graphs
    Sortino, Renato
    Palazzo, Simone
    Rundo, Francesco
    Spampinato, Concetto
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
  • [5] A transformer-based hierarchical registration framework for multimodality deformable image registration
    Zhao, Yao
    Chen, Xinru
    Mcdonald, Brigid
    Yu, Cenji
    Mohamed, Abdalah S. R.
    Fuller, Clifton D.
    Court, Laurence E.
    Pan, Tinsu
    Wang, He
    Wang, Xin
    Phan, Jack
    Yang, Jinzhong
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2023, 108
  • [6] Medical image super-resolution via transformer-based hierarchical encoder-decoder network
    Sun, Jianhao
    Zeng, Xiangqin
    Lei, Xiang
    Gao, Mingliang
    Li, Qilei
    Zhang, Housheng
    Ba, Fengli
    NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS, 2024, 13 (01):
  • [7] Improving Conversational Recommender Systems via Transformer-based Sequential Modelling
    Zou, Jie
    Kanoulas, Evangelos
    Ren, Pengjie
    Ren, Zhaochun
    Sun, Aixin
    Long, Cheng
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2319 - 2324
  • [8] Transformer-based Image Compression
    Lu, Ming
    Guo, Peiyao
    Shi, Huiqing
    Cao, Chuntong
    Ma, Zhan
    DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 469 - 469
  • [9] ProxyMatting: Transformer-based image matting via region proxy
    Li, Jide
    Yang, Kequan
    Wu, Yuanchen
    Ye, Xichen
    Yang, Hanqi
    Li, Xiaoqiang
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [10] StyleSwin: Transformer-based GAN for High-resolution Image Generation
    Zhang, Bowen
    Gu, Shuyang
    Zhang, Bo
    Bao, Jianmin
    Chen, Dong
    Wen, Fang
    Wang, Yong
    Guo, Baining
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11294 - 11304