Efficient and expressive high-resolution image synthesis via variational autoencoder-enriched transformers with sparse attention mechanisms

被引:0
|
作者
Tang, Bingyin [1 ]
Feng, Fan [2 ]
机构
[1] Nanyang Inst Technol, Nanyang Res Inst Big Data, Nanyang, Peoples R China
[2] Nanyang Inst Technol, Nanyang, Peoples R China
关键词
high-resolution image synthesis; variational autoencoders; transformers; sparse attention mechanisms; context-rich vocabulary; sequential image synthesis;
D O I
10.1117/1.JEI.33.3.033002
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
. We introduce a method for efficient and expressive high-resolution image synthesis, harnessing the power of variational autoencoders (VAEs) and transformers with sparse attention (SA) mechanisms. By utilizing VAEs, we can establish a context-rich vocabulary of image constituents, thereby capturing intricate image features in a superior manner compared with traditional techniques. Subsequently, we employ SA mechanisms within our transformer model, improving computational efficiency while dealing with long sequences inherent to high-resolution images. Extending beyond traditional conditional synthesis, our model successfully integrates both nonspatial and spatial information while also incorporating temporal dynamics, enabling sequential image synthesis. Through rigorous experiments, we demonstrate our method's effectiveness in semantically guided synthesis of megapixel images. Our findings substantiate this method as a significant contribution to the field of high-resolution image synthesis.
引用
收藏
页数:15
相关论文
共 9 条
  • [1] Taming Transformers for High-Resolution Image Synthesis
    Esser, Patrick
    Rombach, Robin
    Ommer, Bjoern
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12868 - 12878
  • [2] Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
    Cao, Shiyue
    Yin, Yueqin
    Huang, Lianghua
    Liu, Yu
    Zhao, Xin
    Zhao, Deli
    Huang, Kaiqi
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7334 - 7343
  • [3] FontTransformer: Few-shot high-resolution Chinese glyph image synthesis via stacked transformers
    Liu, Yitian
    Lian, Zhouhui
    [J]. PATTERN RECOGNITION, 2023, 141
  • [4] An Efficient Attention Based Image Adversarial Attack Algorithm with Differential Evolution on Realistic High-Resolution Image
    Yuan, Hao
    Li, Shaofei
    Sun, Wanzhen
    Li, Zheng
    Steven, Xin
    [J]. 2021 IEEE/ACIS 21ST INTERNATIONAL FALL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2021-FALL), 2021, : 115 - 120
  • [5] Ship Segmentation via Combined Attention Mechanism and Efficient Channel Attention High-Resolution Representation Network
    Li, Xiaoyi
    [J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (08)
  • [6] CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying
    Liu, Weihuang
    Cun, Xiaodong
    Pun, Chi-Man
    Xia, Menghan
    Zhang, Yong
    Wang, Jue
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1746 - 1754
  • [7] Latent space manipulation for high-resolution medical image synthesis via the StyleGAN
    Fetty, Lukas
    Bylund, Mikael
    Kuess, Peter
    Heilemann, Gerd
    Nyholm, Tufve
    Georg, Dietmar
    Lofstedt, Tommy
    [J]. ZEITSCHRIFT FUR MEDIZINISCHE PHYSIK, 2020, 30 (04): : 305 - 314
  • [8] High-Resolution Remote Sensing Image Semantic Segmentation via Multiscale Context and Linear Self-Attention
    Yin, Peng
    Zhang, Dongmei
    Han, Wei
    Li, Jiang
    Cheng, Jianmei
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 (9174-9185) : 9174 - 9185
  • [9] MCAT-UNet: Convolutional and Cross-Shaped Window Attention Enhanced UNet for Efficient High-Resolution Remote Sensing Image Segmentation
    Wang, Tao
    Xu, Chao
    Liu, Bin
    Yang, Guang
    Zhang, Erlei
    Niu, Dangdang
    Zhang, Hongming
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 9745 - 9758