LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild

被引：2

作者：

Chen, Zhipeng ^{[1
]}

Wang, Xinheng ^{[1
]}

Xie, Lun ^{[1
]}

Yuan, Haijie ^{[2
]}

Pan, Hang ^{[3
]}

机构：

[1] Univ Sci & Technol Beijing, Beijing 100083, Peoples R China

[2] Xiaoduo Intelligent Technol Beijing Co Ltd, Beijing 100094, Peoples R China

[3] Changzhi Univ, Dept Comp Sci, Changzhi 046011, Peoples R China

来源：

SPEECH COMMUNICATION | 2024年 / 157卷

基金：

北京市自然科学基金;

关键词：

Audio-driven generation; Lip synthesis; LPIPS loss; Multimodal fusion; Talking head generation;

D O I：

10.1016/j.specom.2023.103028

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Researchers have shown a growing interest in Audio -driven Talking Head Generation. The primary challenge in talking head generation is achieving audio-visual coherence between the lips and the audio, known as lip synchronization. This paper proposes a generic method, LPIPS-AttnWav2Lip, for reconstructing face images of any speaker based on audio. We used the U -Net architecture based on residual CBAM to better encode and fuse audio and visual modal information. Additionally, the semantic alignment module extends the receptive field of the generator network to obtain the spatial and channel information of the visual features efficiently; and match statistical information of visual features with audio latent vector to achieve the adjustment and injection of the audio content information to the visual information. To achieve exact lip synchronization and to generate realistic high -quality images, our approach adopts LPIPS Loss, which simulates human judgment of image quality and reduces instability possibility during the training process. The proposed method achieves outstanding performance in terms of lip synchronization accuracy and visual quality as demonstrated by subjective and objective evaluation results.

引用

页数：8

共 10 条

[1] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Cheng, Kun
Cun, Xiaodong
Zhang, Yong
Xia, Menghan
Yin, Fei
Zhu, Mingrui
Wang, Xuan
Wang, Jue
Wang, Nannan
PROCEEDINGS SIGGRAPH ASIA 2022, 2022,
[2] Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
Gan, Yuan
Yang, Zongxin
Yue, Xihang
Sun, Lingyun
Yang, Yi
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22577 - 22588
[3] Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion
Wang, Suzhen
Li, Lincheng
Ding, Yu
Fan, Changjie
Yu, Xin
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1098 - 1105
[4] Audio-driven Talking Head Generation with Transformer and 3D Morphable Model
Huang, Ricong
Zhong, Weizhi
Li, Guanbin
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7035 - 7039
[5] MergeTalk: Audio-Driven Talking Head Generation From Single Image With Feature Merge
Gao, Jian
Shu, Chang
Zheng, Ximin
Lu, Zheng
Bao, Nengsheng
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1850 - 1854
[6] META TALK: LEARNING TO DATA-EFFICIENTLY GENERATE AUDIO-DRIVEN LIP-SYNCHRONIZED TALKING FACE WITH HIGH DEFINITION
Zhang, Yuhan
He, Weihua
Li, Minglei
Tian, Kun
Zhang, Ziyang
Cheng, Jie
Wang, Yaoyuan
Liao, Jianxing
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4848 - 4852
[7] Wav2NeRF: Audio-driven realistic talking head generation via wavelet-based NeRF
Shin, Ah-Hyung
Lee, Jae-Ho
Hwang, Jiwon
Kim, Yoonhyung
Park, Gyeong-Moon
IMAGE AND VISION COMPUTING, 2024, 148
[8] Multi-Level Feature Dynamic Fusion Neural Radiance Fields for Audio-Driven Talking Head Generation
Song, Wenchao
Liu, Qiong
Liu, Yanchao
Zhang, Pengzhou
Cao, Juan
APPLIED SCIENCES-BASEL, 2025, 15 (01):
[9] VividWav2Lip: High-Fidelity Facial Animation Generation Based on Speech-Driven Lip Synchronization
Liu, Li
Wang, Jinhui
Chen, Shijuan
Li, Zongmei
ELECTRONICS, 2024, 13 (18)
[10] Wav2Lip-HR: Synthesising clear high-resolution talking head in the wild
Liang, Chao
Wang, Qinghua
Chen, Yunlin
Tang, Minjie
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (01)

← 1 →