KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

被引:0
|
作者
Xu, Zhihao [1 ]
Gong, Shengjie [1 ]
Tang, Jiapeng [2 ]
Liang, Lingyu [3 ]
Huang, Yining [3 ]
Li, Haojie [1 ]
Huang, Shuangping [1 ,3 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Tech Univ Munich, Munich, Germany
[3] Pazhou Lab, Guangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speech-driven; 3D Facial Animation; Key Motion;
D O I
10.1007/978-3-031-72992-8_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains challenging. Direct regression of the entire sequence often leads to over-smoothed results due to the ill-posed nature of the problem. To this end, we propose a progressive learning mechanism that generates 3D facial animations by introducing key motion capture to decrease cross-modal mapping uncertainty and learning complexity. Concretely, our method integrates linguistic and data-driven priors through two modules: the linguistic-based key motion acquisition and the cross-modal motion completion. The former identifies key motions and learns the associated 3D facial expressions, ensuring accurate lip-speech synchronization. The latter extends key motions into a full sequence of 3D talking faces guided by audio features, improving temporal coherence and audio-visual consistency. Extensive experimental comparisons against existing state-of-the-art methods demonstrate the superiority of our approach in generating more vivid and consistent talking face animations. Consistent enhancements in results through the integration of our proposed learning scheme with existing methods underscore the efficacy of our approach.
引用
收藏
页码:236 / 253
页数:18
相关论文
共 50 条
  • [41] Speech-Driven Facial Animation by LSTM-RNN for Communication Use
    Nishimura, Ryosuke
    Sakata, Nobuchika
    Tominaga, Tomu
    Hijikata, Yoshinori
    Harada, Kensuke
    Kiyokawa, Kiyoshi
    2019 26TH IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2019, : 1102 - 1103
  • [42] Speech-driven animation with meaningful behaviors
    Sadoughi, Najmeh
    Busso, Carlos
    SPEECH COMMUNICATION, 2019, 110 : 90 - 100
  • [43] Geometry-Guided Dense Perspective Network for Speech-Driven Facial Animation
    Liu, Jingying
    Hui, Binyuan
    Li, Kun
    Liu, Yunke
    Lai, Yu-Kun
    Zhang, Yuxiang
    Liu, Yebin
    Yang, Jingyu
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (12) : 4873 - 4886
  • [44] A comprehensive system for facial animation of generic 3D head models driven by speech
    Terissi, Lucas D.
    Cerda, Mauricio
    Gomez, Juan C.
    Hitschfeld-Kahler, Nancy
    Girau, Bernard
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
  • [45] A comprehensive system for facial animation of generic 3D head models driven by speech
    Lucas D Terissi
    Mauricio Cerda
    Juan C Gómez
    Nancy Hitschfeld-Kahler
    Bernard Girau
    EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [46] Speech-driven face synthesis from 3D video
    Ypsilos, LA
    Hilton, A
    Turkmani, A
    Jackson, PJB
    2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 58 - 65
  • [47] FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning
    Haque, Kazi Injamamul
    Yumak, Zerrin
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 282 - 291
  • [48] Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation
    Terissi, Lucas D.
    Gomez, Juan Carlos
    ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2008, PROCEEDINGS, 2008, 5249 : 33 - 42
  • [49] Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model
    Deena, Salil
    Galata, Aphrodite
    ADVANCES IN VISUAL COMPUTING, PT 1, PROCEEDINGS, 2009, 5875 : 89 - 100
  • [50] 3D facial animation driven by speech-video dual-modal signals
    Ji, Xuejie
    Liao, Zhouzhou
    Dong, Lanfang
    Tang, Yingchao
    Li, Guoming
    Mao, Meng
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 5951 - 5964