Text2Video: Automatic Video Generation Based on Text Scripts

被引:3
|
作者
Yu, Yipeng [1 ]
Tu, Zirui [1 ]
Lu, Longyu [1 ]
Chen, Xiao [1 ]
Zhan, Hui [1 ]
Sun, Zixun [1 ]
机构
[1] Tencent, Interact Entertainment Grp, Shanghai, Peoples R China
关键词
text2video; video generation; video editing; video dubbing;
D O I
10.1145/3474085.3478548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To make video creation simpler, in this paper we present Text2Video, a novel system to automatically produce videos using only text-editing for novice users. Given an input text script, the director-like system can generate game-related engaging videos which illustrate the given narrative, provide diverse multi-modal content, and follow video editing guidelines. The system involves five modules: (1) A material manager extracts highlights from raw live game videos, and tags each video highlight, image and audio with labels. (2) A natural language processor extracts entities and semantics from the input text scripts. (3) A refined cross-modal retrieval searches for matching candidate shots from the material manager. (4) A text to speech speaker reads the processed text scripts with synthesized human voice. (5) The selected material shots and synthesized speech are assembled artistically through appropriate video editing techniques.
引用
收藏
页码:2753 / 2755
页数:3
相关论文
共 50 条
  • [1] Text2Video: An End-to-end Learning Framework for Expressing Text With Videos
    Yang, Xiaoshan
    Zhang, Tianzhu
    Xu, Changsheng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (09) : 2360 - 2370
  • [2] TEXT2VIDEO: TEXT-DRIVEN TALKING-HEAD VIDEO SYNTHESIS WITH PERSONALIZED PHONEME - POSE DICTIONARY
    Zhang, Sibo
    Yuan, Jiahong
    Liao, Miao
    Zhang, Liangjun
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2659 - 2663
  • [3] Text2Video: Text-driven facial animation using MPEG-4
    Rurainsky, J
    Eisert, P
    [J]. VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2005, PTS 1-4, 2005, 5960 : 492 - 500
  • [4] Automatic text segmentation and text recognition for video indexing
    Rainer Lienhart
    Wolfgang Effelsberg
    [J]. Multimedia Systems, 2000, 8 : 69 - 81
  • [5] Automatic text segmentation and text recognition for video indexing
    Lienhart, R
    Effelsberg, W
    [J]. MULTIMEDIA SYSTEMS, 2000, 8 (01) : 69 - 81
  • [6] Video Generation from Text
    Li, Yitong
    Min, Martin Renqiang
    Shen, Dinghan
    Carlson, David
    Carin, Lawrence
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7065 - 7072
  • [7] Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval
    Song, Xue
    Chen, Jingjing
    Wu, Zuxuan
    Jiang, Yu-Gang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2914 - 2923
  • [8] Automatic video text localization and recognition
    Guo, Ge
    Jin, Jin
    Ping, Xijian
    Zhang, Tao
    [J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS, 2007, : 484 - +
  • [9] Automatic video text localization and recognition
    Saracoglu, Ahmet
    Alatan, A. Aydin
    [J]. 2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 964 - 967
  • [10] Controllable Video Generation With Text-Based Instructions
    Koksal, Ali
    Ak, Kenan E.
    Sun, Ying
    Rajan, Deepu
    Lim, Joo Hwee
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 190 - 201