Audio-Driven Facial Animation with Deep Learning: A Survey

被引：0

作者：

Jiang, Diqiong ^{[1
]}

Chang, Jian ^{[1
]}

You, Lihua ^{[1
]}

Bian, Shaojun ^{[2
]}

Kosk, Robert ^{[1
]}

Maguire, Greg ^{[3
]}

机构：

[1] Bournemouth Univ, Natl Ctr Comp Animat, Poole BH12 5BB, England

[2] Buckinghamshire New Univ, Sch Creat & Digital Ind, High Wycombe HP11 2JZ, England

[3] Ulster Univ, Belfast Sch Art, Belfast BT15 1ED, North Ireland

来源：

INFORMATION | 2024年 / 15卷 / 11期

基金：

欧盟地平线“2020”;

关键词：

deep learning; audio processing; talking head; face generation; AUDIOVISUAL CORPUS; SPEECH;

D O I：

10.3390/info15110675

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Audio-driven facial animation is a rapidly evolving field that aims to generate realistic facial expressions and lip movements synchronized with a given audio input. This survey provides a comprehensive review of deep learning techniques applied to audio-driven facial animation, with a focus on both audio-driven facial image animation and audio-driven facial mesh animation. These approaches employ deep learning to map audio inputs directly onto 3D facial meshes or 2D images, enabling the creation of highly realistic and synchronized animations. This survey also explores evaluation metrics, available datasets, and the challenges that remain, such as disentangling lip synchronization and emotions, generalization across speakers, and dataset limitations. Lastly, we discuss future directions, including multi-modal integration, personalized models, and facial attribute modification in animations, all of which are critical for the continued development and application of this technology.

引用

页数：24

共 50 条

[21] Photorealistic Audio-driven Video Portraits
Wen, Xin
Wang, Miao
Richardt, Christian
Chen, Ze-Yin
Hu, Shi-Min
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (12) : 3457 - 3466
[22] Audio-Driven Laughter Behavior Controller
Ding, Yu
Huang, Jing
Pelachaud, Catherine
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2017, 8 (04) : 546 - 558
[23] Audio-Driven Emotional Video Portraits
Ji, Xinya
Zhou, Hang
Wang, Kaisiyuan
Wu, Wayne
Loy, Chen Change
Cao, Xun
Xu, Feng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14075 - 14084
[24] SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Zhang, Wenxuan
Cun, Xiaodong
Wang, Xuan
Zhang, Yong
Shen, Xi
Guo, Yu
Shan, Ying
Wang, Fei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8652 - 8661
[25] Voice2Face: Audio-driven Facial and Tongue Rig Animations with cVAEs
Aylagas, Monica Villanueva
Leon, Hector Anadon
Teye, Mattias
Tollmar, Konrad
COMPUTER GRAPHICS FORUM, 2022, 41 (08) : 255 - 265
[26] Video-Audio Driven Real-Time Facial Animation
Liu, Yilong
Xu, Feng
Chai, Jinxiang
Tong, Xin
Wang, Lijuan
Huo, Qiang
ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):
[27] Audio- and Gaze-driven Facial Animation of Codec Avatars
Richard, Alexander
Lea, Colin
Ma, Shugao
Gall, Juergen
de la Torre, Fernando
Sheikh, Yaser
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 41 - 50
[28] A HYBRID PHONEME BASED CLUSTERING APPROACH FOR AUDIO DRIVEN FACIAL ANIMATION
Havell, Benjamin
Rosin, Paul L.
Sanei, Saeid
Aubrey, Andrew
Marshall, David
Hicks, Yulia
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2261 - 2264
[29] Audio-Driven Multimedia Content Authentication as a Service
Vryzas, Nikolaos
Katsaounidou, Anastasia
Kotsakis, Rigas
Dimoulas, Charalampos
Kalliris, George
146TH AES CONVENTION, 2019,
[30] Audio-Driven Talking Face Generation: A Review
Liu, Shiguang
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2023, 71 (7-8): : 408 - 419

← 1 2 3 4 5 →