Imitator: Personalized Speech-driven 3D Facial Animation

被引：5

作者：

Thambiraja, Balamurugan ^{[1
]}

Habibie, Ikhsanul ^{[2
]}

Aliakbarian, Sadegh ^{[3
]}

Cosker, Darren ^{[3
]}

Theobalt, Christian ^{[2
]}

Thies, Justus ^{[1
]}

机构：

[1] Max Planck Inst Intelligent Syst, Tubingen, Germany

[2] Max Planck Inst Informat, Saarland, Germany

[3] Microsoft, Mesh Labs, Cambridge, England

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

关键词：

D O I：

10.1109/ICCV51070.2023.01885

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech-driven 3D facial animation has been widely explored, with applications in gaming, character animation, virtual reality, and telepresence systems. State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies, thus, resulting in unrealistic and inaccurate lip movements. To address this, we present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video and produces novel facial expressions matching the identity-specific speaking style and facial idiosyncrasies of the target actor. Specifically, we train a style-agnostic transformer on a large facial expression dataset which we use as a prior for audio-driven facial expressions. We utilize this prior to optimize for identity-specific speaking style based on a short reference video. To train the prior, we introduce a novel loss function based on detected bilabial consonants to ensure plausible lip closures and consequently improve the realism of the generated expressions. Through detailed experiments and user studies, we show that our approach improves Lip-Sync by 49% and produces expressive facial animations from input audio while preserving the actor's speaking style. Project page: https://balamuruganthambiraja.github.io/Imitator

引用

页码：20564 / 20574

页数：11

共 50 条

[41] A comprehensive system for facial animation of generic 3D head models driven by speech
Terissi, Lucas D.
Cerda, Mauricio
Gomez, Juan C.
Hitschfeld-Kahler, Nancy
Girau, Bernard
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
[42] A comprehensive system for facial animation of generic 3D head models driven by speech
Lucas D Terissi
Mauricio Cerda
Juan C Gómez
Nancy Hitschfeld-Kahler
Bernard Girau
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013
[43] Speech-driven face synthesis from 3D video
Ypsilos, LA
Hilton, A
Turkmani, A
Jackson, PJB
[J]. 2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 58 - 65
[44] FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning
Haque, Kazi Injamamul
Yumak, Zerrin
[J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 282 - 291
[45] Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation
Terissi, Lucas D.
Gomez, Juan Carlos
[J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2008, PROCEEDINGS, 2008, 5249 : 33 - 42
[46] Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model
Deena, Salil
Galata, Aphrodite
[J]. ADVANCES IN VISUAL COMPUTING, PT 1, PROCEEDINGS, 2009, 5875 : 89 - 100
[47] 3D facial animation driven by speech-video dual-modal signals
Ji, Xuejie
Liao, Zhouzhou
Dong, Lanfang
Tang, Yingchao
Li, Guoming
Mao, Meng
[J]. COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 5951 - 5964
[48] Personalized Audio-Driven 3D Facial Animation via Style-Content Disentanglement
Chai, Yujin
Shao, Tianjia
Weng, Yanlin
Zhou, Kun
[J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (03) : 1803 - 1820
[49] Speech driven facial animation
Yang, TJ
Lin, IC
Hung, CS
Huang, CF
Ming, OY
[J]. COMPUTER ANIMATION AND SIMULATION'99, 1999, : 99 - 108
[50] Learning Speech-driven 3D Conversational Gestures from Video
Habibie, Ikhsanul
Xu, Weipeng
Mehta, Dushyant
Liu, Lingjie
Seidel, Hans-Peter
Pons-Moll, Gerard
Elgharib, Mohamed
Theobalt, Christian
[J]. PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA), 2021, : 101 - 108

← 1 2 3 4 5 →