Swin-GAN: generative adversarial network based on shifted windows transformer architecture for image generation

被引：5

作者：

Wang, Shibin ^{[1
]}

Gao, Zidiao ^{[1
]}

Liu, Dong ^{[1
]}

机构：

[1] Henan Normal Univ, Sch Comp & Informat Engn, Xinxiang 453007, Henan, Peoples R China

来源：

VISUAL COMPUTER | 2023年 / 39卷 / 12期

基金：

中国国家自然科学基金;

关键词：

GAN; Transformer; Self-attention; Image generation;

D O I：

10.1007/s00371-022-02714-9

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

It is well known that every successful generative adversarial network (GAN) relies on the convolutional neural networks (CNN)-based generators and discriminators. However, CNN cannot process the long-range dependencies because its convolution operator has a local receptive field, which can bring some issues to GAN, such as the optimization, the loss of feature resolution and the fine details. To meet the problem of long-term dependence, we propose a GAN model based on shifted windows Transformer architecture, called Swin-GAN, in which the CNN architecture is replaced by Transformer. In our model, we build a memory-friendly generator based on the shifted window attention mechanism to gradually increase the resolution of feature maps at each stage. Another, we build a multi-scale discriminator to split the image into patches of different sizes as the input at different stages, which can achieve the balance between capturing global contextual semantic information and local detailed features. To further improve the fidelity and stability, we use the techniques such as data enhancement, layer normalization and relative position coding in our model. Compared with the current schemes, the experimental results show that our scheme has better performance, fewer parameters and lower computational cost. Specifically, Params value of Swin-GAN model is 30.254M, and Floating-Point Operations Per Second (FLOPs) value is 4.086G. Inception Score (IS) is 9.04 and Frechet Inception Distance (FID) is 9.23 in CIFAR-10.

引用

页码：6085 / 6095

页数：11

共 50 条

[41] A transformer generative adversarial network for multi-track music generation
Jin, Cong
Wang, Tao
Li, Xiaobing
Tie, Chu Jie Jiessie
Tie, Yun
Liu, Shan
Yan, Ming
Li, Yongzhi
Wang, Junxian
Huang, Shenze
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2022, 7 (03) : 369 - 380
[42] LC-GAN: Image-to-image Translation Based on Generative Adversarial Network for Endoscopic Images
Lin, Shan
Qin, Fangbo
Li, Yangming
Bly, Randall A.
Moe, Kris S.
Hannaford, Blake
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 2914 - 2920
[43] EL-GAN: Edge-Enhanced Generative Adversarial Network for Layout-to-Image Generation
Gao, Lin
Wu, Lei
Meng, Xiangxu
COMPUTER GRAPHICS FORUM, 2022, 41 (07) : 407 - 418
[44] DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation
Huang, Mengqi
Mao, Zhendong
Wang, Penghui
Wang, Quan
Zhang, Yongdong
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4345 - 4354
[45] FISS GAN: A Generative Adversarial Network for Foggy Image Semantic Segmentation
Liu, Kunhua
Ye, Zihao
Guo, Hongyan
Cao, Dongpu
Chen, Long
Wang, Fei-Yue
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2021, 8 (08) : 1428 - 1439
[46] Network Traffic Anomaly Detection Based on Generative Adversarial Network and Transformer
Wang, Zhurong
Zhou, Jing
Hei, Xinhong
ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 228 - 235
[47] FISS GAN: A Generative Adversarial Network for Foggy Image Semantic Segmentation
Kunhua Liu
Zihao Ye
Hongyan Guo
Dongpu Cao
Long Chen
Fei-Yue Wang
IEEE/CAAJournalofAutomaticaSinica, 2021, 8 (08) : 1428 - 1439
[48] DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for Text-to-Image Generation
Wang, Zhiwei
Yang, Jing
Cui, Jiajun
Liu, Jiawei
Wang, Jiahao
COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 3 - 19
[49] Task-GAN: Improving Generative Adversarial Network for Image Reconstruction
Ouyang, Jiahong
Wang, Guanhua
Gong, Enhao
Chen, Kevin
Pauly, John
Zaharchuk, Greg
MACHINE LEARNING FOR MEDICAL IMAGE RECONSTRUCTION, MLMIR 2019, 2019, 11905 : 193 - 204
[50] Single Image Desnow Based on Vision Transformer and Conditional Generative Adversarial Network for Internet of Vehicles
Wei, Bingcai
Wang, Di
Wang, Zhuang
Zhang, Liye
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 137 (02): : 1975 - 1988

← 1 2 3 4 5 →