Audio-Driven Facial Animation with Deep Learning: A Survey

被引:0
|
作者
Jiang, Diqiong [1 ]
Chang, Jian [1 ]
You, Lihua [1 ]
Bian, Shaojun [2 ]
Kosk, Robert [1 ]
Maguire, Greg [3 ]
机构
[1] Bournemouth Univ, Natl Ctr Comp Animat, Poole BH12 5BB, England
[2] Buckinghamshire New Univ, Sch Creat & Digital Ind, High Wycombe HP11 2JZ, England
[3] Ulster Univ, Belfast Sch Art, Belfast BT15 1ED, North Ireland
基金
欧盟地平线“2020”;
关键词
deep learning; audio processing; talking head; face generation; AUDIOVISUAL CORPUS; SPEECH;
D O I
10.3390/info15110675
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Audio-driven facial animation is a rapidly evolving field that aims to generate realistic facial expressions and lip movements synchronized with a given audio input. This survey provides a comprehensive review of deep learning techniques applied to audio-driven facial animation, with a focus on both audio-driven facial image animation and audio-driven facial mesh animation. These approaches employ deep learning to map audio inputs directly onto 3D facial meshes or 2D images, enabling the creation of highly realistic and synchronized animations. This survey also explores evaluation metrics, available datasets, and the challenges that remain, such as disentangling lip synchronization and emotions, generalization across speakers, and dataset limitations. Lastly, we discuss future directions, including multi-modal integration, personalized models, and facial attribute modification in animations, all of which are critical for the continued development and application of this technology.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Audio-Driven Robot Upper-Body Motion Synthesis
    Ondras, Jan
    Celiktutan, Oya
    Bremner, Paul
    Gunes, Hatice
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (11) : 5445 - 5454
  • [42] Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
    Tong, Haonan
    Li, Haopeng
    Du, Hongyang
    Yang, Zhaohui
    Yin, Changchuan
    Niyato, Dusit
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2025, 14 (01) : 93 - 97
  • [43] A Survey of Audio Classification Using Deep Learning
    Zaman, Khalid
    Sah, Melike
    Direkoglu, Cem
    Unoki, Masashi
    IEEE ACCESS, 2023, 11 : 106620 - 106649
  • [44] Deep Audio-visual Learning: A Survey
    Hao Zhu
    Man-Di Luo
    Rui Wang
    Ai-Hua Zheng
    Ran He
    International Journal of Automation and Computing, 2021, 18 (03) : 351 - 376
  • [45] Deep Audio-visual Learning: A Survey
    Hao Zhu
    Man-Di Luo
    Rui Wang
    Ai-Hua Zheng
    Ran He
    International Journal of Automation and Computing, 2021, 18 : 351 - 376
  • [46] Deep Audio-visual Learning: A Survey
    Zhu, Hao
    Luo, Man-Di
    Wang, Rui
    Zheng, Ai-Hua
    He, Ran
    INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2021, 18 (03) : 351 - 376
  • [47] Partial linear regresston for audio-driven talking head application
    Hsieh, CK
    Chen, YC
    2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 281 - 284
  • [48] Audio-driven Neural Gesture Reenactment with Video Motion Graphs
    Zhou, Yang
    Yang, Jimei
    Li, Dingzeyu
    Saito, Jun
    Aneja, Deepali
    Kalogerakis, Evangelos
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3408 - 3418
  • [49] Spatially and Temporally Optimized Audio-Driven Talking Face Generation
    Dong, Biao
    Ma, Bo-Yao
    Zhang, Lei
    COMPUTER GRAPHICS FORUM, 2024, 43 (07)
  • [50] Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
    Pham, Hai X.
    Cheung, Samuel
    Pavlovic, Vladimir
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2328 - 2336