FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion

被引:11
|
作者
Stan, Stefan [1 ]
Haque, Kazi Injamamul [1 ]
Yumak, Zerrin [1 ]
机构
[1] Univ Utrecht, Utrecht, Netherlands
来源
15TH ANNUAL ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION AND GAMES, MIG 2023 | 2023年
关键词
facial animation synthesis; deep learning; virtual humans; mesh animation; blendshape animation;
D O I
10.1145/3623264.3624447
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech-driven 3D facial animation synthesis has been a challenging task both in industry and research. Recent methods mostly focus on deterministic deep learning methods meaning that given a speech input, the output is always the same. However, in reality, the non-verbal facial cues that reside throughout the face are non-deterministic in nature. In addition, majority of the approaches focus on 3D vertex based datasets and methods that are compatible with existing facial animation pipelines with rigged characters is scarce. To eliminate these issues, we present FaceDiffuser, a non-deterministic deep learning model to generate speech-driven facial animations that is trained with both 3D vertex and blendshape based datasets. Our method is based on the diffusion technique and uses the pre-trained large speech representation model HuBERT to encode the audio input. To the best of our knowledge, we are the first to employ the diffusion method for the task of speech-driven 3D facial animation synthesis. We have run extensive objective and subjective analyses and show that our approach achieves better or comparable results in comparison to the state-of-the-art methods. We also introduce a new in-house dataset that is based on a blendshape based rigged character. The code and the dataset will be publicly available on the project page(1).
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Speech-Driven 3D Facial Animation with Mesh Convolution
    Ji, Xuejie
    Su, Zewei
    Dong, Lanfang
    Li, Guoming
    2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 14 - 18
  • [2] Imitator: Personalized Speech-driven 3D Facial Animation
    Thambiraja, Balamurugan
    Habibie, Ikhsanul
    Aliakbarian, Sadegh
    Cosker, Darren
    Theobalt, Christian
    Thies, Justus
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20564 - 20574
  • [3] Speech-driven 3D Facial Animation for Mobile Entertainment
    Yan, Juan
    Xie, Xiang
    Hu, Hao
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2334 - 2337
  • [4] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
    Fan, Yingruo
    Lin, Zhaojiang
    Saito, Jun
    Wang, Wenping
    Komura, Taku
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18749 - 18758
  • [5] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
    Zhang, Xitie
    Wu, Suping
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
  • [6] Individual 3D face synthesis based on orthogonal photos and speech-driven facial animation
    Shan, SG
    Gao, W
    Yan, J
    Zhang, HM
    Chen, XL
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 238 - 241
  • [7] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
    Xing, Jinbo
    Xia, Menghan
    Zhang, Yuechen
    Cun, Xiaodong
    Wang, Jue
    Wong, Tien-Tsin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12780 - 12790
  • [8] Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation
    Fu, Hui
    Wang, Zeqing
    Gong, Ke
    Wang, Keze
    Chen, Tianshui
    Li, Haojie
    Zeng, Haifeng
    Kang, Wenxiong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1770 - 1777
  • [9] KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
    Xu, Zhihao
    Gong, Shengjie
    Tang, Jiapeng
    Liang, Lingyu
    Huang, Yining
    Li, Haojie
    Huang, Shuangping
    COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 236 - 253
  • [10] Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
    Wu, Haozhe
    Zhou, Songtao
    Jia, Jia
    Xing, Junliang
    Wen, Qi
    Wen, Xiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6822 - 6830