CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation

被引:0
|
作者
Liang, Xiangyu [1 ]
Zhuang, Wenlin [1 ]
Wang, Tianyong [1 ]
Geng, Guangxing [2 ]
Geng, Guangyue [2 ]
Xia, Haifeng [1 ]
Xia, Siyu [1 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing, Peoples R China
[2] Nanjing 8 8 Digital Technol Co Ltd, Nanjing, Peoples R China
关键词
D O I
10.1109/FG59268.2024.10581920
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations. The main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions. Although lip alignment has seen many related studies, existing methods struggle to synthesize natural and realistic expressions, resulting in a mechanical and stiff appearance of facial animations. Even with some research extracting emotional features from speech, the randomness of facial movements limits the effective expression of emotions. To address this issue, this paper proposes a method called CSTalk (Correlation Supervised) that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions that conform to human facial motion patterns. To generate more intricate animations, we employ a rich set of control parameters based on the metahuman character model and capture a dataset for five different emotions. We train a generative network using an autoencoder structure and input an emotion embedding vector to achieve the generation of user-control expressions. Experimental results demonstrate that our method outperforms existing state-of-the-art methods.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Speech-Driven 3D Facial Animation with Mesh Convolution
    Ji, Xuejie
    Su, Zewei
    Dong, Lanfang
    Li, Guoming
    [J]. 2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 14 - 18
  • [2] Speech-driven 3D Facial Animation for Mobile Entertainment
    Yan, Juan
    Xie, Xiang
    Hu, Hao
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2334 - 2337
  • [3] Imitator: Personalized Speech-driven 3D Facial Animation
    Thambiraja, Balamurugan
    Habibie, Ikhsanul
    Aliakbarian, Sadegh
    Cosker, Darren
    Theobalt, Christian
    Thies, Justus
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20564 - 20574
  • [4] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
    Fan, Yingruo
    Lin, Zhaojiang
    Saito, Jun
    Wang, Wenping
    Komura, Taku
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18749 - 18758
  • [5] Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
    Pham, Hai X.
    Cheung, Samuel
    Pavlovic, Vladimir
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2328 - 2336
  • [6] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
    Zhang, Xitie
    Wu, Suping
    [J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179
  • [7] EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
    Peng, Ziqiao
    Wu, Haoyu
    Song, Zhenbo
    Xu, Hao
    Zhu, Xiangyu
    He, Jun
    Liu, Hongyan
    Fan, Zhaoxin
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20630 - 20640
  • [8] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
    Xing, Jinbo
    Xia, Menghan
    Zhang, Yuechen
    Cun, Xiaodong
    Wang, Jue
    Wong, Tien-Tsin
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12780 - 12790
  • [9] Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation
    Fu, Hui
    Wang, Zeqing
    Gong, Ke
    Wang, Keze
    Chen, Tianshui
    Li, Haojie
    Zeng, Haifeng
    Kang, Wenxiong
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1770 - 1777
  • [10] FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
    Stan, Stefan
    Haque, Kazi Injamamul
    Yumak, Zerrin
    [J]. 15TH ANNUAL ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION AND GAMES, MIG 2023, 2023,