Text2Video: Text-driven facial animation using MPEG-4

被引:0
|
作者
Rurainsky, J [1 ]
Eisert, P [1 ]
机构
[1] Heinrich Hertz Inst Nachrichtentech Berlin GmbH, Fraunhofer Inst Telecommun, Image Proc Dept, D-10587 Berlin, Germany
关键词
MPEG-4; facial animation; text-driven animation; SMS; MMS;
D O I
10.1117/12.631413
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
We present a complete system for the automatic creation of talking head video sequences from text messages. Our system converts the text into MPEG-4 Facial Animation Parameters and synthetic voice. A user selected 3D character will perform lip movements synchronized to the speech data. The 3D models created from a single image vary from realistic people to cartoon characters. A voice selection for different languages and gender as well as a pitch shift component enables a personalization of the animation. The animation can be shown on different displays and devices ranging from 3GPP players on mobile phones to real-time 3D render engines. Therefore, our system can be used in mobile communication for the conversion of regular SMS messages to MMS animations.
引用
收藏
页码:492 / 500
页数:9
相关论文
共 50 条
  • [1] TEXT2VIDEO: TEXT-DRIVEN TALKING-HEAD VIDEO SYNTHESIS WITH PERSONALIZED PHONEME - POSE DICTIONARY
    Zhang, Sibo
    Yuan, Jiahong
    Liao, Miao
    Zhang, Liangjun
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2659 - 2663
  • [2] MPEG-4 facial animation in video analysis and synthesis
    Eisert, P
    [J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2003, 13 (05) : 245 - 256
  • [3] Text-Driven Video Prediction
    Song, Xue
    Chen, Jingjing
    Zhu, Bin
    Jiang, Yu-Gang
    [J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20 (09)
  • [4] Text-driven Speech Animation with Emotion Control
    Chae, Wonseok
    Kim, Yejin
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2020, 14 (08): : 3473 - 3487
  • [5] An MPEG-4 facial animation system driven by synthetic speech
    Lande, C
    Francini, G
    [J]. 1998 MULTIMEDIA MODELING, PROCEEDINGS, 1998, : 203 - 212
  • [6] Text2Performer: Text-Driven Human Video Generation
    Jiang, Yuming
    Yang, Shuai
    Koh, Tong Liang
    Wu, Wayne
    Loy, Chen Change
    Liu, Ziwei
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22690 - 22700
  • [7] Text2LIVE: Text-Driven Layered Image and Video Editing
    Bar-Tal, Omer
    Ofri-Amar, Dolev
    Fridman, Rafail
    Kasten, Yoni
    Dekel, Tali
    [J]. COMPUTER VISION - ECCV 2022, PT XV, 2022, 13675 : 707 - 723
  • [8] Text2Video: Automatic Video Generation Based on Text Scripts
    Yu, Yipeng
    Tu, Zirui
    Lu, Longyu
    Chen, Xiao
    Zhan, Hui
    Sun, Zixun
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2753 - 2755
  • [9] Lip tracking for MPEG-4 facial animation
    Wu, ZL
    Aleksic, PS
    Katsaggelos, AK
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 293 - 298
  • [10] Optimizing facial animation parameters for MPEG-4
    Hovden, G
    Ling, N
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2003, 49 (04) : 1354 - 1359