Text2Video: Automatic Video Generation Based on Text Scripts

被引：3

作者：

Yu, Yipeng ^{[1
]}

Tu, Zirui ^{[1
]}

Lu, Longyu ^{[1
]}

Chen, Xiao ^{[1
]}

Zhan, Hui ^{[1
]}

Sun, Zixun ^{[1
]}

机构：

[1] Tencent, Interact Entertainment Grp, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

关键词：

text2video; video generation; video editing; video dubbing;

D O I：

10.1145/3474085.3478548

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To make video creation simpler, in this paper we present Text2Video, a novel system to automatically produce videos using only text-editing for novice users. Given an input text script, the director-like system can generate game-related engaging videos which illustrate the given narrative, provide diverse multi-modal content, and follow video editing guidelines. The system involves five modules: (1) A material manager extracts highlights from raw live game videos, and tags each video highlight, image and audio with labels. (2) A natural language processor extracts entities and semantics from the input text scripts. (3) A refined cross-modal retrieval searches for matching candidate shots from the material manager. (4) A text to speech speaker reads the processed text scripts with synthesized human voice. (5) The selected material shots and synthesized speech are assembled artistically through appropriate video editing techniques.

引用

页码：2753 / 2755

页数：3

共 50 条

[1] Text2Video: An End-to-end Learning Framework for Expressing Text With Videos
Yang, Xiaoshan
Zhang, Tianzhu
Xu, Changsheng
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (09) : 2360 - 2370
[2] TEXT2VIDEO: TEXT-DRIVEN TALKING-HEAD VIDEO SYNTHESIS WITH PERSONALIZED PHONEME - POSE DICTIONARY
Zhang, Sibo
Yuan, Jiahong
Liao, Miao
Zhang, Liangjun
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2659 - 2663
[3] Text2Video: Text-driven facial animation using MPEG-4
Rurainsky, J
Eisert, P
[J]. VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2005, PTS 1-4, 2005, 5960 : 492 - 500
[4] Automatic text segmentation and text recognition for video indexing
Rainer Lienhart
Wolfgang Effelsberg
[J]. Multimedia Systems, 2000, 8 : 69 - 81
[5] Automatic text segmentation and text recognition for video indexing
Lienhart, R
Effelsberg, W
[J]. MULTIMEDIA SYSTEMS, 2000, 8 (01) : 69 - 81
[6] Video Generation from Text
Li, Yitong
Min, Martin Renqiang
Shen, Dinghan
Carlson, David
Carin, Lawrence
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7065 - 7072
[7] Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval
Song, Xue
Chen, Jingjing
Wu, Zuxuan
Jiang, Yu-Gang
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2914 - 2923
[8] Automatic video text localization and recognition
Guo, Ge
Jin, Jin
Ping, Xijian
Zhang, Tao
[J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS, 2007, : 484 - +
[9] Automatic video text localization and recognition
Saracoglu, Ahmet
Alatan, A. Aydin
[J]. 2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 964 - 967
[10] Controllable Video Generation With Text-Based Instructions
Koksal, Ali
Ak, Kenan E.
Sun, Ying
Rajan, Deepu
Lim, Joo Hwee
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 190 - 201

← 1 2 3 4 5 →