Learning Speech-driven 3D Conversational Gestures from Video

被引:32
|
作者
Habibie, Ikhsanul [1 ]
Xu, Weipeng [2 ]
Mehta, Dushyant [1 ]
Liu, Lingjie [1 ]
Seidel, Hans-Peter [1 ]
Pons-Moll, Gerard [3 ]
Elgharib, Mohamed [1 ]
Theobalt, Christian [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Facebook Real Labs, Redmond, WA USA
[3] Univ Tubingen, Tubingen, Germany
关键词
gesture synthesis; character control; audio-driven pose estimation;
D O I
10.1145/3472306.3478335
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose the first approach to synthesize the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from speech input. Our algorithm uses a CNN architecture that leverages the inherent correlation between facial expression and hand gestures. Synthesis of conversational body gestures is a multi-modal problem since many similar gestures can plausibly accompany the same input speech. To synthesize plausible body gestures in this setting, we train a Generative Adversarial Network (GAN) based model that measures the plausibility of the generated sequences of 3D body motion when paired with the input audio features. We also contribute a new corpus that contains more than 33 hours of annotated data from in-the-wild videos of talking people. To this end, we apply state-of-the-art monocular approaches for 3D body and hand pose estimation as well as 3D face performance capture to the video corpus. In this way, we can train on orders of magnitude more data than previous algorithms that resort to complex in-studio motion capture solutions, and thereby train more expressive synthesis algorithms. Our experiments and user study show the state-of-the-art quality of our speech-synthesized full 3D character animations.
引用
收藏
页码:101 / 108
页数:8
相关论文
共 50 条
  • [1] Speech-driven face synthesis from 3D video
    Ypsilos, LA
    Hilton, A
    Turkmani, A
    Jackson, PJB
    [J]. 2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 58 - 65
  • [2] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
    Zhang, Xitie
    Wu, Suping
    [J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
  • [3] Speech-Driven 3D Facial Animation with Mesh Convolution
    Ji, Xuejie
    Su, Zewei
    Dong, Lanfang
    Li, Guoming
    [J]. 2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 14 - 18
  • [4] Speech-driven 3D Facial Animation for Mobile Entertainment
    Yan, Juan
    Xie, Xiang
    Hu, Hao
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2334 - 2337
  • [5] Imitator: Personalized Speech-driven 3D Facial Animation
    Thambiraja, Balamurugan
    Habibie, Ikhsanul
    Aliakbarian, Sadegh
    Cosker, Darren
    Theobalt, Christian
    Thies, Justus
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20564 - 20574
  • [6] HMM BASED SPEECH-DRIVEN 3D TONGUE ANIMATION
    Luo, Changwei
    Yu, Jun
    Li, Xian
    Zhang, Leilei
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 4377 - 4381
  • [7] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
    Fan, Yingruo
    Lin, Zhaojiang
    Saito, Jun
    Wang, Wenping
    Komura, Taku
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18749 - 18758
  • [8] 3D Visual passcode: Speech-driven 3D facial dynamics for behaviometrics
    Zhang, Jie
    Fisher, Robert B.
    [J]. SIGNAL PROCESSING, 2019, 160 : 164 - 177
  • [9] Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
    Pham, Hai X.
    Cheung, Samuel
    Pavlovic, Vladimir
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2328 - 2336
  • [10] Automatic Quality Assessment of Speech-Driven Synthesized Gestures
    He, Zhiyuan
    [J]. INTERNATIONAL JOURNAL OF COMPUTER GAMES TECHNOLOGY, 2022, 2022