Scripted Video Generation With a Bottom-Up Generative Adversarial Network

被引:12
|
作者
Chen, Qi [1 ,2 ]
Wu, Qi [3 ]
Chen, Jian [1 ]
Wu, Qingyao [1 ]
van den Hengel, Anton [3 ]
Tan, Mingkui [1 ]
机构
[1] South China Univ Technol, Sch Software Engn, Guangzhou 510640, Peoples R China
[2] Pazhou Lab, Guangzhou 510335, Peoples R China
[3] Univ Adelaide, Sch Comp Sci, Adelaide, SA 5005, Australia
基金
中国国家自然科学基金;
关键词
Generative adversarial networks; video generation; semantic alignment; temporal coherence;
D O I
10.1109/TIP.2020.3003227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating videos given a text description (such as a script) is non-trivial due to the intrinsic complexity of image frames and the structure of videos. Although Generative Adversarial Networks (GANs) have been successfully applied to generate images conditioned on a natural language description, it is still very challenging to generate realistic videos in which the frames are required to follow both spatial and temporal coherence. In this paper, we propose a novel Bottom-up GAN (BoGAN) method for generating videos given a text description. To ensure the coherence of the generated frames and also make the whole video match the language descriptions semantically, we design a bottom-up optimisation mechanism to train BoGAN. Specifically, we devise a region-level loss via attention mechanism to preserve the local semantic alignment and draw details in different sub-regions of video conditioned on words which are most relevant to them. Moreover, to guarantee the matching between text and frame, we introduce a frame-level discriminator, which can also maintain the fidelity of each frame and the coherence across frames. Last, to ensure the global semantic alignment between whole video and given text, we apply a video-level discriminator. We evaluate the effectiveness of the proposed BoGAN on two synthetic datasets (i.e., SBMG and TBMG) and two real-world datasets (i.e., MSVD and KTH).
引用
收藏
页码:7454 / 7467
页数:14
相关论文
共 50 条
  • [1] SURFGenerator: Generative Adversarial Network Modeling for Synthetic Flooding Video Generation
    Lamczyk, Stephen
    Ampofo, Kwame
    Salashour, Behrouz
    Cetin, Mecit
    Iftekharuddin, Khan M.
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [2] A Generative Adversarial Network for Video Compression
    Du, Pengli
    Liu, Ying
    Ling, Nam
    Liu, Lingzhi
    Ren, Yongxiong
    Hsu, Ming Kai
    BIG DATA IV: LEARNING, ANALYTICS, AND APPLICATIONS, 2022, 12097
  • [3] Bottom-up layout generation
    Hower, Walter
    Informatica (Ljubljana), 1996, 20 (01): : 57 - 63
  • [4] Depth-Aware Generative Adversarial Network for Talking Head Video Generation
    Hong, Fa-Ting
    Zhang, Longhao
    Shen, Li
    Xu, Dan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3387 - 3396
  • [5] Video spatio-temporal generative adversarial network for local action generation
    Liu, Xuejun
    Guo, Jiacheng
    Cui, Zhongji
    Liu, Ling
    Yan, Yong
    Sha, Yun
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (05)
  • [6] Speech Generation by Generative Adversarial Network
    Chen, Yijia
    2021 2ND INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2021), 2021, : 435 - 438
  • [7] Bottom-Up Generation of Hypermedia Documents
    Jörg Caumanns
    Multimedia Tools and Applications, 2000, 12 : 109 - 128
  • [8] Video deblurring using the generative adversarial network
    Shen H.
    Bian Q.
    Chen X.
    Wang Z.
    Tian X.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (06): : 112 - 117
  • [9] Bottom-up generation of hypermedia documents
    Caumanns, J
    MULTIMEDIA TOOLS AND APPLICATIONS, 2000, 12 (2-3) : 109 - 128
  • [10] Bottom-up Assembly of the Phytochrome Network
    Sanchez-Lamas, Maximiliano
    Lorenzo, Christian D.
    Cerdan, Pablo D.
    PLOS GENETICS, 2016, 12 (11):