Speech-Driven 3D Face Animation with Composite and Regional Facial Movements

被引：1

作者：

Wu, Haozhe ^{[1
]}

Zhou, Songtao ^{[1
]}

Jia, Jia ^{[2
]}

Xing, Junliang ^{[1
]}

Wen, Qi ^{[3
]}

Wen, Xiang ^{[3
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China

[3] ByteDance, Hangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Speech-Driven 3D Face Animation; Regional; Composite; Non-autoregressive;

D O I：

10.1145/3581783.3611775

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech-driven 3D face animation poses significant challenges due to the intricacy and variability inherent in human facial movements. This paper emphasizes the importance of considering both the composite and regional natures of facial movements in speech-driven 3D face animation. The composite nature pertains to how speech-independent factors globally modulate speech-driven facial movements along the temporal dimension. Meanwhile, the regional nature alludes to the notion that facial movements are not globally correlated but are actuated by local musculature along the spatial dimension. It is thus indispensable to incorporate both natures for engendering vivid animation. To address the composite nature, we introduce an adaptive modulation module that employs arbitrary facial movements to dynamically adjust speech-driven facial movements across frames on a global scale. To accommodate the regional nature, our approach ensures that each constituent of the facial features for every frame focuses on the local spatial movements of 3D faces. Moreover, we present a non-autoregressive backbone for translating audio to 3D facial movements, which maintains high-frequency nuances of facial movements and facilitates efficient inference. Comprehensive experiments and user studies demonstrate that our method surpasses contemporary state-of-the-art approaches both qualitatively and quantitatively.

引用

页码：6822 / 6830

页数：9

共 50 条

[1] Speech-Driven 3D Facial Animation with Mesh Convolution
Ji, Xuejie
Su, Zewei
Dong, Lanfang
Li, Guoming
[J]. 2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 14 - 18
[2] Speech-driven 3D Facial Animation for Mobile Entertainment
Yan, Juan
Xie, Xiang
Hu, Hao
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2334 - 2337
[3] Imitator: Personalized Speech-driven 3D Facial Animation
Thambiraja, Balamurugan
Habibie, Ikhsanul
Aliakbarian, Sadegh
Cosker, Darren
Theobalt, Christian
Thies, Justus
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20564 - 20574
[4] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
Fan, Yingruo
Lin, Zhaojiang
Saito, Jun
Wang, Wenping
Komura, Taku
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18749 - 18758
[5] Individual 3D face synthesis based on orthogonal photos and speech-driven facial animation
Shan, SG
Gao, W
Yan, J
Zhang, HM
Chen, XL
[J]. 2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 238 - 241
[6] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
Zhang, Xitie
Wu, Suping
[J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
[7] Real-time speech-driven 3D face animation
Hong, PY
Wen, Z
Huang, TS
Shum, HY
[J]. FIRST INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING VISUALIZATION AND TRANSMISSION, 2002, : 713 - 716
[8] EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
Peng, Ziqiao
Wu, Haoyu
Song, Zhenbo
Xu, Hao
Zhu, Xiangyu
He, Jun
Liu, Hongyan
Fan, Zhaoxin
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20630 - 20640
[9] FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
Stan, Stefan
Haque, Kazi Injamamul
Yumak, Zerrin
[J]. 15TH ANNUAL ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION AND GAMES, MIG 2023, 2023,
[10] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
Xing, Jinbo
Xia, Menghan
Zhang, Yuechen
Cun, Xiaodong
Wang, Jue
Wong, Tien-Tsin
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12780 - 12790

← 1 2 3 4 5 →