Video Generation from Text

被引:0
|
作者
Li, Yitong [1 ,2 ]
Min, Martin Renqiang [2 ]
Shen, Dinghan [1 ,2 ]
Carlson, David [1 ]
Carin, Lawrence [1 ]
机构
[1] Duke Univ, Durham, NC 27708 USA
[2] NEC Labs Amer, Princeton, NJ 08540 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating videos from text has proven to be a significant challenge for existing generative models. We tackle this problem by training a conditional generative model to extract both static and dynamic information from text. This is manifested in a hybrid framework, employing a Variational Autoencoder (VAE) and a Generative Adversarial Network (GAN). The static features, called "gist," are used to sketch text-conditioned background color and object layout structure. Dynamic features are considered by transforming input text into an image filter. To obtain a large amount of data for training the deep-learning model, we develop a method to automatically create a matched text-video corpus from publicly available online videos. Experimental results show that the proposed framework generates plausible and diverse short-duration smooth videos, while accurately reflecting the input text information. It significantly outperforms baseline models that directly adapt text-to-image generation procedures to produce videos. Performance is evaluated both visually and by adapting the inception score used to evaluate image generation in GANs.
引用
收藏
页码:7065 / 7072
页数:8
相关论文
共 50 条
  • [1] Text2Video: Automatic Video Generation Based on Text Scripts
    Yu, Yipeng
    Tu, Zirui
    Lu, Longyu
    Chen, Xiao
    Zhan, Hui
    Sun, Zixun
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2753 - 2755
  • [2] Textbooks for the YouTube generation? A case study on the shift from text to video
    Granitz, Neil
    Kohli, Chiranjeev
    Lancellotti, Matthew P.
    [J]. JOURNAL OF EDUCATION FOR BUSINESS, 2021, 96 (05) : 299 - 307
  • [3] Video Generation from Text Employing Latent Path Construction for Temporal Modeling
    Mazaheri, Amir
    Shah, Mubarak
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 5010 - 5016
  • [4] Text2Performer: Text-Driven Human Video Generation
    Jiang, Yuming
    Yang, Shuai
    Koh, Tong Liang
    Wu, Wayne
    Loy, Chen Change
    Liu, Ziwei
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22690 - 22700
  • [5] Controllable Video Generation With Text-Based Instructions
    Koksal, Ali
    Ak, Kenan E.
    Sun, Ying
    Rajan, Deepu
    Lim, Joo Hwee
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 190 - 201
  • [6] A Benchmark for Controllable Text -Image-to-Video Generation
    Hu, Yaosi
    Luo, Chong
    Chen, Zhenzhong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1706 - 1719
  • [7] Sounding Video Generator: A Unified Framework for Text-Guided Sounding Video Generation
    Liu, Jiawei
    Wang, Weining
    Chen, Sihan
    Zhu, Xinxin
    Liu, Jing
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 141 - 153
  • [8] Alignment and Generation Adapter for Efficient Video-text Understanding
    Fang, Han
    Yang, Zhifei
    Wei, Yuhan
    Zang, Xianghao
    Ban, Chao
    Feng, Zerun
    He, Zhongjiang
    Li, Yongxiang
    Sun, Hao
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2783 - 2789
  • [9] Text-to-video Generation: Research Status, Progress and Challenges
    Deng, Zijun
    He, Xiangteng
    Peng, Yuxin
    [J]. Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (05): : 1632 - 1644
  • [10] Face database generation based on text-video correlation
    Zeng, Dan
    Bao, Yixin
    Liu, Ke
    Zhao, Fan
    Tian, Qi
    [J]. NEUROCOMPUTING, 2016, 207 : 240 - 249