KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

被引:0
|
作者
Xu, Zhihao [1 ]
Gong, Shengjie [1 ]
Tang, Jiapeng [2 ]
Liang, Lingyu [3 ]
Huang, Yining [3 ]
Li, Haojie [1 ]
Huang, Shuangping [1 ,3 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Tech Univ Munich, Munich, Germany
[3] Pazhou Lab, Guangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speech-driven; 3D Facial Animation; Key Motion;
D O I
10.1007/978-3-031-72992-8_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains challenging. Direct regression of the entire sequence often leads to over-smoothed results due to the ill-posed nature of the problem. To this end, we propose a progressive learning mechanism that generates 3D facial animations by introducing key motion capture to decrease cross-modal mapping uncertainty and learning complexity. Concretely, our method integrates linguistic and data-driven priors through two modules: the linguistic-based key motion acquisition and the cross-modal motion completion. The former identifies key motions and learns the associated 3D facial expressions, ensuring accurate lip-speech synchronization. The latter extends key motions into a full sequence of 3D talking faces guided by audio features, improving temporal coherence and audio-visual consistency. Extensive experimental comparisons against existing state-of-the-art methods demonstrate the superiority of our approach in generating more vivid and consistent talking face animations. Consistent enhancements in results through the integration of our proposed learning scheme with existing methods underscore the efficacy of our approach.
引用
收藏
页码:236 / 253
页数:18
相关论文
共 50 条
  • [1] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
    Xing, Jinbo
    Xia, Menghan
    Zhang, Yuechen
    Cun, Xiaodong
    Wang, Jue
    Wong, Tien-Tsin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12780 - 12790
  • [2] Speech-Driven 3D Facial Animation with Mesh Convolution
    Ji, Xuejie
    Su, Zewei
    Dong, Lanfang
    Li, Guoming
    2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 14 - 18
  • [3] Imitator: Personalized Speech-driven 3D Facial Animation
    Thambiraja, Balamurugan
    Habibie, Ikhsanul
    Aliakbarian, Sadegh
    Cosker, Darren
    Theobalt, Christian
    Thies, Justus
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20564 - 20574
  • [4] Speech-driven 3D Facial Animation for Mobile Entertainment
    Yan, Juan
    Xie, Xiang
    Hu, Hao
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2334 - 2337
  • [5] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
    Fan, Yingruo
    Lin, Zhaojiang
    Saito, Jun
    Wang, Wenping
    Komura, Taku
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18749 - 18758
  • [6] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
    Zhang, Xitie
    Wu, Suping
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
  • [7] FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
    Stan, Stefan
    Haque, Kazi Injamamul
    Yumak, Zerrin
    15TH ANNUAL ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION AND GAMES, MIG 2023, 2023,
  • [8] Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation
    Fu, Hui
    Wang, Zeqing
    Gong, Ke
    Wang, Keze
    Chen, Tianshui
    Li, Haojie
    Zeng, Haifeng
    Kang, Wenxiong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1770 - 1777
  • [9] Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
    Wu, Haozhe
    Zhou, Songtao
    Jia, Jia
    Xing, Junliang
    Wen, Qi
    Wen, Xiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6822 - 6830
  • [10] ANALYZING VISIBLE ARTICULATORY MOVEMENTS IN SPEECH PRODUCTION FOR SPEECH-DRIVEN 3D FACIAL ANIMATION
    Kim, Hyung Kyu
    Lee, Sangmin
    Kim, Hak Gu
    Proceedings - International Conference on Image Processing, ICIP, 2024, : 3575 - 3579