Imitator: Personalized Speech-driven 3D Facial Animation

被引:5
|
作者
Thambiraja, Balamurugan [1 ]
Habibie, Ikhsanul [2 ]
Aliakbarian, Sadegh [3 ]
Cosker, Darren [3 ]
Theobalt, Christian [2 ]
Thies, Justus [1 ]
机构
[1] Max Planck Inst Intelligent Syst, Tubingen, Germany
[2] Max Planck Inst Informat, Saarland, Germany
[3] Microsoft, Mesh Labs, Cambridge, England
关键词
D O I
10.1109/ICCV51070.2023.01885
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech-driven 3D facial animation has been widely explored, with applications in gaming, character animation, virtual reality, and telepresence systems. State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies, thus, resulting in unrealistic and inaccurate lip movements. To address this, we present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video and produces novel facial expressions matching the identity-specific speaking style and facial idiosyncrasies of the target actor. Specifically, we train a style-agnostic transformer on a large facial expression dataset which we use as a prior for audio-driven facial expressions. We utilize this prior to optimize for identity-specific speaking style based on a short reference video. To train the prior, we introduce a novel loss function based on detected bilabial consonants to ensure plausible lip closures and consequently improve the realism of the generated expressions. Through detailed experiments and user studies, we show that our approach improves Lip-Sync by 49% and produces expressive facial animations from input audio while preserving the actor's speaking style. Project page: https://balamuruganthambiraja.github.io/Imitator
引用
收藏
页码:20564 / 20574
页数:11
相关论文
共 50 条
  • [41] A comprehensive system for facial animation of generic 3D head models driven by speech
    Terissi, Lucas D.
    Cerda, Mauricio
    Gomez, Juan C.
    Hitschfeld-Kahler, Nancy
    Girau, Bernard
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
  • [42] A comprehensive system for facial animation of generic 3D head models driven by speech
    Lucas D Terissi
    Mauricio Cerda
    Juan C Gómez
    Nancy Hitschfeld-Kahler
    Bernard Girau
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [43] Speech-driven face synthesis from 3D video
    Ypsilos, LA
    Hilton, A
    Turkmani, A
    Jackson, PJB
    [J]. 2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 58 - 65
  • [44] FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning
    Haque, Kazi Injamamul
    Yumak, Zerrin
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 282 - 291
  • [45] Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation
    Terissi, Lucas D.
    Gomez, Juan Carlos
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2008, PROCEEDINGS, 2008, 5249 : 33 - 42
  • [46] Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model
    Deena, Salil
    Galata, Aphrodite
    [J]. ADVANCES IN VISUAL COMPUTING, PT 1, PROCEEDINGS, 2009, 5875 : 89 - 100
  • [47] 3D facial animation driven by speech-video dual-modal signals
    Ji, Xuejie
    Liao, Zhouzhou
    Dong, Lanfang
    Tang, Yingchao
    Li, Guoming
    Mao, Meng
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 5951 - 5964
  • [48] Personalized Audio-Driven 3D Facial Animation via Style-Content Disentanglement
    Chai, Yujin
    Shao, Tianjia
    Weng, Yanlin
    Zhou, Kun
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (03) : 1803 - 1820
  • [49] Speech driven facial animation
    Yang, TJ
    Lin, IC
    Hung, CS
    Huang, CF
    Ming, OY
    [J]. COMPUTER ANIMATION AND SIMULATION'99, 1999, : 99 - 108
  • [50] Learning Speech-driven 3D Conversational Gestures from Video
    Habibie, Ikhsanul
    Xu, Weipeng
    Mehta, Dushyant
    Liu, Lingjie
    Seidel, Hans-Peter
    Pons-Moll, Gerard
    Elgharib, Mohamed
    Theobalt, Christian
    [J]. PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA), 2021, : 101 - 108