Image-to-Video Translation Using a VAE-GAN with Refinement Network

被引:0
|
作者
Wang, Shengli [1 ]
Xieshi, Mulin [2 ]
Zhou, Zhangpeng [1 ]
Zhang, Xiang [2 ]
Liu, Xujie [2 ]
Tang, Zeyi [2 ]
Xiahou, Jianbing [3 ]
Lin, Pingyuan [3 ]
Xu, Xuexin [3 ]
Dai, Yuxing [3 ]
机构
[1] Maintenance Co State Grid Power Co Gansu Prov, Lanzhou 730000, Gansu, Peoples R China
[2] State Grid Infotelecom Great Power Sci & Technol, Fuzhou 350000, Peoples R China
[3] Xiamen Univ, Sch Informat, Xiamen 361005, Peoples R China
关键词
Video generation; Variational autoencoder; Generative Adversarial Network; Refinement network;
D O I
10.1007/978-3-031-13870-6_42
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the development of deep learning technology, various techniques for image processing have emerged in the field of computer vision in recent years, and have excellent performance in a variety of application scenarios. In contrast to the prediction task of predicting video with multiple consecutive frames before and after the input to predict the missing images in the middle, the task of image-to-video generation proposed in this paper does not require multiple consecutive frames, but rather the directional content generation of images by inputting the first frame image with the embedding vector of motion features, and to address some of the existing problems, this paper innovates the network architecture to solve the generated video problems such as incoherence, frame loss and blurring. For multiple image-to-video translation tasks, we propose a VAE-RGAN network with a further refinement network. We add a refinement network and use new identity matching loss and connected feature matching loss to eliminate VAE and GAN's respective shortcomings and enhance the visual quality of the generated videos. Weizmann datasets have been the subject of a wide range of qualitative and quantitative experiments. We draw the following conclusions from this empirical study: (1) Compared with state-of-the-art approaches, our approach (VAE-RGAN) exhibits significant improvements in generative capability; (2) Experiments shows that our designed VAE-RGAN structure achieves better results and the refinement network significantly improves the problems of a blur.
引用
收藏
页码:494 / 505
页数:12
相关论文
共 50 条
  • [1] Two-Channel VAE-GAN Based Image-To-Video Translation
    Wang, Shengli
    Xieshi, Mulin
    Zhou, Zhangpeng
    Zhang, Xiang
    Liu, Xujie
    Tang, Zeyi
    Dai, Yuxing
    Xu, Xuexin
    Lin, Pingyuan
    INTELLIGENT COMPUTING THEORIES AND APPLICATION (ICIC 2022), PT I, 2022, 13393 : 430 - 443
  • [2] Ising granularity image analysis on VAE-GAN
    Chen, Guoming
    Long, Shun
    Yuan, Zeduo
    Zhu, Weiheng
    Chen, Qiang
    Wu, Yilin
    MACHINE VISION AND APPLICATIONS, 2022, 33 (06)
  • [3] Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GAN
    Alemayehu, Kidist
    Jifara, Worku
    Jobir, Demissie
    JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2023, 2023
  • [4] Functional brain network identification and fMRI augmentation using a VAE-GAN framework
    Qiang, Ning
    Gao, Jie
    Dong, Qinglin
    Yue, Huiji
    Liang, Hongtao
    Liu, Lili
    Yu, Jingjing
    Hu, Jing
    Zhang, Shu
    Ge, Bao
    Sun, Yifei
    Liu, Zhengliang
    Liu, Tianming
    Li, Jin
    Song, Hujie
    Zhao, Shijie
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 165
  • [5] Nonparallel Emotional Speech Conversion Using VAE-GAN
    Cao, Yuexin
    Liu, Zhengchen
    Chen, Minchuan
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2020, 2020, : 3406 - 3410
  • [6] ImUnity: A generalizable VAE-GAN solution for multicenter MR image harmonization
    Cackowski, Stenzel
    Barbier, Emmanuel L.
    Dojat, Michel
    Christen, Thomas
    MEDICAL IMAGE ANALYSIS, 2023, 88
  • [7] Facies conditional simulation based on VAE-GAN model and image quilting algorithm
    Zhao, Jichuan
    Chen, Shuangquan
    JOURNAL OF APPLIED GEOPHYSICS, 2023, 219
  • [8] Facial Image-to-Video Translation by a Hidden Affine Transformation
    Shen, Guangyao
    Huang, Wenbing
    Gan, Chuang
    Tan, Mingkui
    Huang, Junzhou
    Zhu, Wenwu
    Gong, Boqing
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2505 - 2513
  • [9] HyperCon: Image-To-Video Model Transfer for Video-To-Video Translation Tasks
    Szeto, Ryan
    El-Khamy, Mostafa
    Lee, Jungwon
    Corso, Jason J.
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3079 - 3088
  • [10] Stochastic reconstruction of digital cores using two-discriminator VAE-GAN
    Zhang, Ting
    Shen, Tong
    Hu, Guangshun
    Lu, Fangfang
    Du, Xin
    GEOENERGY SCIENCE AND ENGINEERING, 2024, 236