Learning Speech-driven 3D Conversational Gestures from Video

被引：32

作者：

Habibie, Ikhsanul ^{[1
]}

Xu, Weipeng ^{[2
]}

Mehta, Dushyant ^{[1
]}

Liu, Lingjie ^{[1
]}

Seidel, Hans-Peter ^{[1
]}

Pons-Moll, Gerard ^{[3
]}

Elgharib, Mohamed ^{[1
]}

Theobalt, Christian ^{[1
]}

机构：

[1] Max Planck Inst Informat, Saarbrucken, Germany

[2] Facebook Real Labs, Redmond, WA USA

[3] Univ Tubingen, Tubingen, Germany

来源：

PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA) | 2021年

关键词：

gesture synthesis; character control; audio-driven pose estimation;

D O I：

10.1145/3472306.3478335

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose the first approach to synthesize the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from speech input. Our algorithm uses a CNN architecture that leverages the inherent correlation between facial expression and hand gestures. Synthesis of conversational body gestures is a multi-modal problem since many similar gestures can plausibly accompany the same input speech. To synthesize plausible body gestures in this setting, we train a Generative Adversarial Network (GAN) based model that measures the plausibility of the generated sequences of 3D body motion when paired with the input audio features. We also contribute a new corpus that contains more than 33 hours of annotated data from in-the-wild videos of talking people. To this end, we apply state-of-the-art monocular approaches for 3D body and hand pose estimation as well as 3D face performance capture to the video corpus. In this way, we can train on orders of magnitude more data than previous algorithms that resort to complex in-studio motion capture solutions, and thereby train more expressive synthesis algorithms. Our experiments and user study show the state-of-the-art quality of our speech-synthesized full 3D character animations.

引用

页码：101 / 108

页数：8

共 50 条

[1] Speech-driven face synthesis from 3D video
Ypsilos, LA
Hilton, A
Turkmani, A
Jackson, PJB
[J]. 2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 58 - 65
[2] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
Zhang, Xitie
Wu, Suping
[J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
[3] Speech-Driven 3D Facial Animation with Mesh Convolution
Ji, Xuejie
Su, Zewei
Dong, Lanfang
Li, Guoming
[J]. 2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 14 - 18
[4] Speech-driven 3D Facial Animation for Mobile Entertainment
Yan, Juan
Xie, Xiang
Hu, Hao
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2334 - 2337
[5] Imitator: Personalized Speech-driven 3D Facial Animation
Thambiraja, Balamurugan
Habibie, Ikhsanul
Aliakbarian, Sadegh
Cosker, Darren
Theobalt, Christian
Thies, Justus
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20564 - 20574
[6] HMM BASED SPEECH-DRIVEN 3D TONGUE ANIMATION
Luo, Changwei
Yu, Jun
Li, Xian
Zhang, Leilei
[J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 4377 - 4381
[7] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
Fan, Yingruo
Lin, Zhaojiang
Saito, Jun
Wang, Wenping
Komura, Taku
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18749 - 18758
[8] 3D Visual passcode: Speech-driven 3D facial dynamics for behaviometrics
Zhang, Jie
Fisher, Robert B.
[J]. SIGNAL PROCESSING, 2019, 160 : 164 - 177
[9] Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
Pham, Hai X.
Cheung, Samuel
Pavlovic, Vladimir
[J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2328 - 2336
[10] Automatic Quality Assessment of Speech-Driven Synthesized Gestures
He, Zhiyuan
[J]. INTERNATIONAL JOURNAL OF COMPUTER GAMES TECHNOLOGY, 2022, 2022

← 1 2 3 4 5 →