KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

被引：0

作者：

Xu, Zhihao ^{[1
]}

Gong, Shengjie ^{[1
]}

Tang, Jiapeng ^{[2
]}

Liang, Lingyu ^{[3
]}

Huang, Yining ^{[3
]}

Li, Haojie ^{[1
]}

Huang, Shuangping ^{[1
,3
]}

机构：

[1] South China Univ Technol, Guangzhou, Peoples R China

[2] Tech Univ Munich, Munich, Germany

[3] Pazhou Lab, Guangzhou, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LVI | 2025年 / 15114卷

基金：

中国国家自然科学基金;

关键词：

Speech-driven; 3D Facial Animation; Key Motion;

D O I：

10.1007/978-3-031-72992-8_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains challenging. Direct regression of the entire sequence often leads to over-smoothed results due to the ill-posed nature of the problem. To this end, we propose a progressive learning mechanism that generates 3D facial animations by introducing key motion capture to decrease cross-modal mapping uncertainty and learning complexity. Concretely, our method integrates linguistic and data-driven priors through two modules: the linguistic-based key motion acquisition and the cross-modal motion completion. The former identifies key motions and learns the associated 3D facial expressions, ensuring accurate lip-speech synchronization. The latter extends key motions into a full sequence of 3D talking faces guided by audio features, improving temporal coherence and audio-visual consistency. Extensive experimental comparisons against existing state-of-the-art methods demonstrate the superiority of our approach in generating more vivid and consistent talking face animations. Consistent enhancements in results through the integration of our proposed learning scheme with existing methods underscore the efficacy of our approach.

引用

页码：236 / 253

页数：18

共 50 条

[41] Speech-Driven Facial Animation by LSTM-RNN for Communication Use
Nishimura, Ryosuke
Sakata, Nobuchika
Tominaga, Tomu
Hijikata, Yoshinori
Harada, Kensuke
Kiyokawa, Kiyoshi
2019 26TH IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2019, : 1102 - 1103
[42] Speech-driven animation with meaningful behaviors
Sadoughi, Najmeh
Busso, Carlos
SPEECH COMMUNICATION, 2019, 110 : 90 - 100
[43] Geometry-Guided Dense Perspective Network for Speech-Driven Facial Animation
Liu, Jingying
Hui, Binyuan
Li, Kun
Liu, Yunke
Lai, Yu-Kun
Zhang, Yuxiang
Liu, Yebin
Yang, Jingyu
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (12) : 4873 - 4886
[44] A comprehensive system for facial animation of generic 3D head models driven by speech
Terissi, Lucas D.
Cerda, Mauricio
Gomez, Juan C.
Hitschfeld-Kahler, Nancy
Girau, Bernard
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
[45] A comprehensive system for facial animation of generic 3D head models driven by speech
Lucas D Terissi
Mauricio Cerda
Juan C Gómez
Nancy Hitschfeld-Kahler
Bernard Girau
EURASIP Journal on Audio, Speech, and Music Processing, 2013
[46] Speech-driven face synthesis from 3D video
Ypsilos, LA
Hilton, A
Turkmani, A
Jackson, PJB
2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 58 - 65
[47] FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning
Haque, Kazi Injamamul
Yumak, Zerrin
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 282 - 291
[48] Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation
Terissi, Lucas D.
Gomez, Juan Carlos
ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2008, PROCEEDINGS, 2008, 5249 : 33 - 42
[49] Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model
Deena, Salil
Galata, Aphrodite
ADVANCES IN VISUAL COMPUTING, PT 1, PROCEEDINGS, 2009, 5875 : 89 - 100
[50] 3D facial animation driven by speech-video dual-modal signals
Ji, Xuejie
Liao, Zhouzhou
Dong, Lanfang
Tang, Yingchao
Li, Guoming
Mao, Meng
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 5951 - 5964

← 1 2 3 4 5 →