Improved Transformer for High-Resolution GANs

被引:0
|
作者
Zhao, Long [1 ,4 ]
Zhang, Zizhao [2 ]
Chen, Ting [3 ]
Metaxas, Dimitris N. [1 ]
Zhang, Han [3 ]
机构
[1] Rutgers State Univ, New Brunswick, NJ 08901 USA
[2] Google Cloud AI, Mountain View, CA USA
[3] Google Res, Mountain View, CA USA
[4] Google Brain Team, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs). In this paper, we introduce two key ingredients to Transformer to address this challenge. First, in low-resolution stages of the generative process, standard global self-attention is replaced with the proposed multi-axis blocked self-attention which allows efficient mixing of local and global attention. Second, in high-resolution stages, we drop self-attention while only keeping multi-layer perceptrons reminiscent of the implicit neural function. To further improve the performance, we introduce an additional self-modulation component based on cross-attention. The resulting model, denoted as HiT, has a nearly linear computational complexity with respect to the image size and thus directly scales to synthesizing high definition images. We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 30.83 and 2.95 on unconditional ImageNet 128 x 128 and FFHQ 256 x 256, respectively, with a reasonable throughput. We believe the proposed HiT is an important milestone for generators in GANs which are completely free of convolutions. Our code is made publicly available at https://github.com/google-research/hit- gan.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] SEMANTIC SEGMENTATION OF HIGH-RESOLUTION REMOTE SENSING IMAGES USING AN IMPROVED TRANSFORMER
    Liu, Yuheng
    Mei, Shaohui
    Zhang, Shun
    Wang, Ye
    He, Mingyi
    Du, Qian
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 3496 - 3499
  • [2] High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
    Wang, Ting-Chun
    Liu, Ming-Yu
    Zhu, Jun-Yan
    Tao, Andrew
    Kautz, Jan
    Catanzaro, Bryan
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8798 - 8807
  • [3] An Improved High-Resolution Raingage
    Hosking, J. G.
    Stow, C. D.
    Bradley, S. G.
    Gray, W. R.
    [J]. JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY, 1986, 3 (03) : 536 - 541
  • [4] HRFormer: High-Resolution Transformer for Dense Prediction
    Yuan, Yuhui
    Fu, Rao
    Huang, Lang
    Lin, Weihong
    Zhang, Chao
    Chen, Xilin
    Wang, Jingdong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Improved Detection of Urolithiasis Using High-Resolution Computed Tomography Images by a Vision Transformer Model
    Choi, Hyoung Sun
    Kim, Jae Seoung
    Whangbo, Taeg Keun
    Eun, Sung Jong
    [J]. INTERNATIONAL NEUROUROLOGY JOURNAL, 2023, 27 : S99 - S103
  • [6] Improved High-resolution Fast Imager
    Denker, Carsten
    Verma, Meetu
    Wisniewska, Aneta
    Kamlah, Robert
    Kontogiannis, Ioannis
    Dineva, Ekaterina
    Rendtel, Juergen
    Bauer, Svend-Marian
    Dionies, Mario
    oenel, Hakan
    Woche, Manfred
    Kuckein, Christoph
    Seelemann, Thomas
    Pal, Partha S.
    [J]. JOURNAL OF ASTRONOMICAL TELESCOPES INSTRUMENTS AND SYSTEMS, 2023, 9 (01) : 15001
  • [7] IMPROVED HIGH-RESOLUTION PUMP CIRCUIT
    ZAFIROPOULOS, P
    [J]. ELECTRONIC ENGINEERING, 1969, 41 (493): : 360 - +
  • [8] IMPROVED HIGH-RESOLUTION GRASSHOPPER MONOCHROMATOR
    BROWN, FC
    [J]. NUCLEAR INSTRUMENTS & METHODS, 1980, 172 (1-2): : 100 - 100
  • [9] Restormer: Efficient Transformer for High-Resolution Image Restoration
    Zamir, Syed Waqas
    Arora, Aditya
    Khan, Salman
    Hayat, Munawar
    Khan, Fahad Shahbaz
    Yang, Ming-Hsuan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5718 - 5729
  • [10] A TOROIDAL DC BEAM CURRENT TRANSFORMER WITH HIGH-RESOLUTION
    UNSER, K
    [J]. IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 1981, 28 (03) : 2344 - 2346