KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

被引:0
|
作者
Xu, Zhihao [1 ]
Gong, Shengjie [1 ]
Tang, Jiapeng [2 ]
Liang, Lingyu [3 ]
Huang, Yining [3 ]
Li, Haojie [1 ]
Huang, Shuangping [1 ,3 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Tech Univ Munich, Munich, Germany
[3] Pazhou Lab, Guangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speech-driven; 3D Facial Animation; Key Motion;
D O I
10.1007/978-3-031-72992-8_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains challenging. Direct regression of the entire sequence often leads to over-smoothed results due to the ill-posed nature of the problem. To this end, we propose a progressive learning mechanism that generates 3D facial animations by introducing key motion capture to decrease cross-modal mapping uncertainty and learning complexity. Concretely, our method integrates linguistic and data-driven priors through two modules: the linguistic-based key motion acquisition and the cross-modal motion completion. The former identifies key motions and learns the associated 3D facial expressions, ensuring accurate lip-speech synchronization. The latter extends key motions into a full sequence of 3D talking faces guided by audio features, improving temporal coherence and audio-visual consistency. Extensive experimental comparisons against existing state-of-the-art methods demonstrate the superiority of our approach in generating more vivid and consistent talking face animations. Consistent enhancements in results through the integration of our proposed learning scheme with existing methods underscore the efficacy of our approach.
引用
收藏
页码:236 / 253
页数:18
相关论文
共 50 条
  • [21] Realistic Speech-Driven Facial Animation with GANs
    Vougioukas, Konstantinos
    Petridis, Stavros
    Pantic, Maja
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (05) : 1398 - 1413
  • [22] Real-time speech-driven 3D face animation
    Hong, PY
    Wen, Z
    Huang, TS
    Shum, HY
    FIRST INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING VISUALIZATION AND TRANSMISSION, 2002, : 713 - 716
  • [23] EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
    Peng, Ziqiao
    Wu, Haoyu
    Song, Zhenbo
    Xu, Hao
    Zhu, Xiangyu
    He, Jun
    Liu, Hongyan
    Fan, Zhaoxin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20630 - 20640
  • [24] 3D Visual passcode: Speech-driven 3D facial dynamics for behaviometrics
    Zhang, Jie
    Fisher, Robert B.
    SIGNAL PROCESSING, 2019, 160 : 164 - 177
  • [25] Speech-driven facial animation using a hierarchical model
    Cosker, DP
    Marshall, AD
    Rosin, PL
    Hicks, YA
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2004, 151 (04): : 314 - 321
  • [26] Blendshape-Based Migratable Speech-Driven 3D Facial Animation with Overlapping Chunking-Transformer
    Chen, Jixi
    Ma, Xiaoliang
    Wang, Lei
    Cheng, Jun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 41 - 53
  • [27] DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
    Sun, Zhiyao
    Lv, Tian
    Ye, Sheng
    Lin, Matthieu
    Sheng, Jenny
    Wen, Yu-Hui
    Yu, Minjing
    Liu, Yong-Jin
    ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
  • [28] Semi-supervised Speech-driven 3D Facial Animation via Cross-modal Encoding
    Yang, Peiji
    Wei, Huawei
    Zhong, Yicheng
    Wang, Zhisheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20975 - 20984
  • [29] A comparison of acoustic coding models for speech-driven facial animation
    Kakumanu, Praveen
    Esposito, Anna
    Garcia, Oscar N.
    Gutierrez-Osuna, Ricardo
    SPEECH COMMUNICATION, 2006, 48 (06) : 598 - 615
  • [30] Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
    Chhatre, Kiran
    Danecek, Radek
    Athanasiou, Nikos
    Becherini, Giorgio
    Peters, Christopher
    Black, Michael J.
    Bolkart, Timo
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1942 - 1953