LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild

被引:2
|
作者
Chen, Zhipeng [1 ]
Wang, Xinheng [1 ]
Xie, Lun [1 ]
Yuan, Haijie [2 ]
Pan, Hang [3 ]
机构
[1] Univ Sci & Technol Beijing, Beijing 100083, Peoples R China
[2] Xiaoduo Intelligent Technol Beijing Co Ltd, Beijing 100094, Peoples R China
[3] Changzhi Univ, Dept Comp Sci, Changzhi 046011, Peoples R China
基金
北京市自然科学基金;
关键词
Audio-driven generation; Lip synthesis; LPIPS loss; Multimodal fusion; Talking head generation;
D O I
10.1016/j.specom.2023.103028
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Researchers have shown a growing interest in Audio -driven Talking Head Generation. The primary challenge in talking head generation is achieving audio-visual coherence between the lips and the audio, known as lip synchronization. This paper proposes a generic method, LPIPS-AttnWav2Lip, for reconstructing face images of any speaker based on audio. We used the U -Net architecture based on residual CBAM to better encode and fuse audio and visual modal information. Additionally, the semantic alignment module extends the receptive field of the generator network to obtain the spatial and channel information of the visual features efficiently; and match statistical information of visual features with audio latent vector to achieve the adjustment and injection of the audio content information to the visual information. To achieve exact lip synchronization and to generate realistic high -quality images, our approach adopts LPIPS Loss, which simulates human judgment of image quality and reduces instability possibility during the training process. The proposed method achieves outstanding performance in terms of lip synchronization accuracy and visual quality as demonstrated by subjective and objective evaluation results.
引用
收藏
页数:8
相关论文
共 10 条
  • [1] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
    Cheng, Kun
    Cun, Xiaodong
    Zhang, Yong
    Xia, Menghan
    Yin, Fei
    Zhu, Mingrui
    Wang, Xuan
    Wang, Jue
    Wang, Nannan
    PROCEEDINGS SIGGRAPH ASIA 2022, 2022,
  • [2] Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
    Gan, Yuan
    Yang, Zongxin
    Yue, Xihang
    Sun, Lingyun
    Yang, Yi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22577 - 22588
  • [3] Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion
    Wang, Suzhen
    Li, Lincheng
    Ding, Yu
    Fan, Changjie
    Yu, Xin
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1098 - 1105
  • [4] Audio-driven Talking Head Generation with Transformer and 3D Morphable Model
    Huang, Ricong
    Zhong, Weizhi
    Li, Guanbin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7035 - 7039
  • [5] MergeTalk: Audio-Driven Talking Head Generation From Single Image With Feature Merge
    Gao, Jian
    Shu, Chang
    Zheng, Ximin
    Lu, Zheng
    Bao, Nengsheng
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1850 - 1854
  • [6] META TALK: LEARNING TO DATA-EFFICIENTLY GENERATE AUDIO-DRIVEN LIP-SYNCHRONIZED TALKING FACE WITH HIGH DEFINITION
    Zhang, Yuhan
    He, Weihua
    Li, Minglei
    Tian, Kun
    Zhang, Ziyang
    Cheng, Jie
    Wang, Yaoyuan
    Liao, Jianxing
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4848 - 4852
  • [7] Wav2NeRF: Audio-driven realistic talking head generation via wavelet-based NeRF
    Shin, Ah-Hyung
    Lee, Jae-Ho
    Hwang, Jiwon
    Kim, Yoonhyung
    Park, Gyeong-Moon
    IMAGE AND VISION COMPUTING, 2024, 148
  • [8] Multi-Level Feature Dynamic Fusion Neural Radiance Fields for Audio-Driven Talking Head Generation
    Song, Wenchao
    Liu, Qiong
    Liu, Yanchao
    Zhang, Pengzhou
    Cao, Juan
    APPLIED SCIENCES-BASEL, 2025, 15 (01):
  • [9] VividWav2Lip: High-Fidelity Facial Animation Generation Based on Speech-Driven Lip Synchronization
    Liu, Li
    Wang, Jinhui
    Chen, Shijuan
    Li, Zongmei
    ELECTRONICS, 2024, 13 (18)
  • [10] Wav2Lip-HR: Synthesising clear high-resolution talking head in the wild
    Liang, Chao
    Wang, Qinghua
    Chen, Yunlin
    Tang, Minjie
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (01)