Imitator: Personalized Speech-driven 3D Facial Animation

被引:5
|
作者
Thambiraja, Balamurugan [1 ]
Habibie, Ikhsanul [2 ]
Aliakbarian, Sadegh [3 ]
Cosker, Darren [3 ]
Theobalt, Christian [2 ]
Thies, Justus [1 ]
机构
[1] Max Planck Inst Intelligent Syst, Tubingen, Germany
[2] Max Planck Inst Informat, Saarland, Germany
[3] Microsoft, Mesh Labs, Cambridge, England
关键词
D O I
10.1109/ICCV51070.2023.01885
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech-driven 3D facial animation has been widely explored, with applications in gaming, character animation, virtual reality, and telepresence systems. State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies, thus, resulting in unrealistic and inaccurate lip movements. To address this, we present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video and produces novel facial expressions matching the identity-specific speaking style and facial idiosyncrasies of the target actor. Specifically, we train a style-agnostic transformer on a large facial expression dataset which we use as a prior for audio-driven facial expressions. We utilize this prior to optimize for identity-specific speaking style based on a short reference video. To train the prior, we introduce a novel loss function based on detected bilabial consonants to ensure plausible lip closures and consequently improve the realism of the generated expressions. Through detailed experiments and user studies, we show that our approach improves Lip-Sync by 49% and produces expressive facial animations from input audio while preserving the actor's speaking style. Project page: https://balamuruganthambiraja.github.io/Imitator
引用
收藏
页码:20564 / 20574
页数:11
相关论文
共 50 条
  • [1] Speech-Driven 3D Facial Animation with Mesh Convolution
    Ji, Xuejie
    Su, Zewei
    Dong, Lanfang
    Li, Guoming
    [J]. 2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 14 - 18
  • [2] Speech-driven 3D Facial Animation for Mobile Entertainment
    Yan, Juan
    Xie, Xiang
    Hu, Hao
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2334 - 2337
  • [3] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
    Fan, Yingruo
    Lin, Zhaojiang
    Saito, Jun
    Wang, Wenping
    Komura, Taku
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18749 - 18758
  • [4] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
    Zhang, Xitie
    Wu, Suping
    [J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
  • [5] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
    Xing, Jinbo
    Xia, Menghan
    Zhang, Yuechen
    Cun, Xiaodong
    Wang, Jue
    Wong, Tien-Tsin
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12780 - 12790
  • [6] Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation
    Fu, Hui
    Wang, Zeqing
    Gong, Ke
    Wang, Keze
    Chen, Tianshui
    Li, Haojie
    Zeng, Haifeng
    Kang, Wenxiong
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1770 - 1777
  • [7] FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
    Stan, Stefan
    Haque, Kazi Injamamul
    Yumak, Zerrin
    [J]. 15TH ANNUAL ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION AND GAMES, MIG 2023, 2023,
  • [8] Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
    Wu, Haozhe
    Zhou, Songtao
    Jia, Jia
    Xing, Junliang
    Wen, Qi
    Wen, Xiang
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6822 - 6830
  • [9] ANALYZING VISIBLE ARTICULATORY MOVEMENTS IN SPEECH PRODUCTION FOR SPEECH-DRIVEN 3D FACIAL ANIMATION
    Kim, Hyung Kyu
    Lee, Sangmin
    Kim, Hak Gu
    [J]. Proceedings - International Conference on Image Processing, ICIP, 2024, : 3575 - 3579
  • [10] Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation
    He, Shan
    He, Haonan
    Yang, Shuo
    Wu, Xiaoyan
    Xia, Pengcheng
    Yin, Bing
    Liu, Cong
    Dai, Lirong
    Xu, Chang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14146 - 14156