KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

被引：0

作者：

Xu, Zhihao ^{[1
]}

Gong, Shengjie ^{[1
]}

Tang, Jiapeng ^{[2
]}

Liang, Lingyu ^{[3
]}

Huang, Yining ^{[3
]}

Li, Haojie ^{[1
]}

Huang, Shuangping ^{[1
,3
]}

机构：

[1] South China Univ Technol, Guangzhou, Peoples R China

[2] Tech Univ Munich, Munich, Germany

[3] Pazhou Lab, Guangzhou, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LVI | 2025年 / 15114卷

基金：

中国国家自然科学基金;

关键词：

Speech-driven; 3D Facial Animation; Key Motion;

D O I：

10.1007/978-3-031-72992-8_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains challenging. Direct regression of the entire sequence often leads to over-smoothed results due to the ill-posed nature of the problem. To this end, we propose a progressive learning mechanism that generates 3D facial animations by introducing key motion capture to decrease cross-modal mapping uncertainty and learning complexity. Concretely, our method integrates linguistic and data-driven priors through two modules: the linguistic-based key motion acquisition and the cross-modal motion completion. The former identifies key motions and learns the associated 3D facial expressions, ensuring accurate lip-speech synchronization. The latter extends key motions into a full sequence of 3D talking faces guided by audio features, improving temporal coherence and audio-visual consistency. Extensive experimental comparisons against existing state-of-the-art methods demonstrate the superiority of our approach in generating more vivid and consistent talking face animations. Consistent enhancements in results through the integration of our proposed learning scheme with existing methods underscore the efficacy of our approach.

引用

页码：236 / 253

页数：18

共 50 条

[21] Realistic Speech-Driven Facial Animation with GANs
Vougioukas, Konstantinos
Petridis, Stavros
Pantic, Maja
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (05) : 1398 - 1413
[22] Real-time speech-driven 3D face animation
Hong, PY
Wen, Z
Huang, TS
Shum, HY
FIRST INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING VISUALIZATION AND TRANSMISSION, 2002, : 713 - 716
[23] EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
Peng, Ziqiao
Wu, Haoyu
Song, Zhenbo
Xu, Hao
Zhu, Xiangyu
He, Jun
Liu, Hongyan
Fan, Zhaoxin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20630 - 20640
[24] 3D Visual passcode: Speech-driven 3D facial dynamics for behaviometrics
Zhang, Jie
Fisher, Robert B.
SIGNAL PROCESSING, 2019, 160 : 164 - 177
[25] Speech-driven facial animation using a hierarchical model
Cosker, DP
Marshall, AD
Rosin, PL
Hicks, YA
IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2004, 151 (04): : 314 - 321
[26] Blendshape-Based Migratable Speech-Driven 3D Facial Animation with Overlapping Chunking-Transformer
Chen, Jixi
Ma, Xiaoliang
Wang, Lei
Cheng, Jun
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 41 - 53
[27] DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
Sun, Zhiyao
Lv, Tian
Ye, Sheng
Lin, Matthieu
Sheng, Jenny
Wen, Yu-Hui
Yu, Minjing
Liu, Yong-Jin
ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
[28] Semi-supervised Speech-driven 3D Facial Animation via Cross-modal Encoding
Yang, Peiji
Wei, Huawei
Zhong, Yicheng
Wang, Zhisheng
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20975 - 20984
[29] A comparison of acoustic coding models for speech-driven facial animation
Kakumanu, Praveen
Esposito, Anna
Garcia, Oscar N.
Gutierrez-Osuna, Ricardo
SPEECH COMMUNICATION, 2006, 48 (06) : 598 - 615
[30] Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
Chhatre, Kiran
Danecek, Radek
Athanasiou, Nikos
Becherini, Giorgio
Peters, Christopher
Black, Michael J.
Bolkart, Timo
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1942 - 1953

← 1 2 3 4 5 →