Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

被引:0
|
作者
Zhou, Hang [1 ]
Liu, Yu [1 ]
Liu, Ziwei [1 ]
Luo, Ping [1 ]
Wang, Xiaogang [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval.
引用
收藏
页码:9299 / 9306
页数:8
相关论文
共 50 条
  • [1] Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
    Zhou, Hang
    Sun, Yasheng
    Wu, Wayne
    Loy, Chen Change
    Wang, Xiaogang
    Liu, Ziwei
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4174 - 4184
  • [2] Audio-visual talking face detection
    Li, MK
    Li, DG
    Dimitrova, N
    Sethi, I
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, 2003, : 473 - 476
  • [3] Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation
    Sun, Yasheng
    Zhou, Hang
    Liu, Ziwei
    Koike, Hideki
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1018 - 1024
  • [4] Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning
    Zhu, Hao
    Huang, Huaibo
    Li, Yi
    Zheng, Aihua
    He, Ran
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2362 - 2368
  • [5] An audio-visual imposture scenario by talking face animation
    Karam, W
    Mokbel, C
    Greige, H
    Aversano, G
    Pelachaud, C
    Chollet, G
    [J]. NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 365 - 369
  • [6] AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
    Sun, Yasheng
    Chu, Wenqing
    Zhou, Hang
    Wang, Kaisiyuan
    Koike, Hideki
    [J]. IEEE ACCESS, 2024, 12 : 57288 - 57301
  • [7] Expressive Talking Head Generation with Granular Audio-Visual Control
    Liang, Borong
    Pan, Yan
    Guo, Zhizhi
    Zhou, Hang
    Hong, Zhibin
    Han, Xiaoguang
    Han, Junyu
    Liu, Jingtuo
    Ding, Errui
    Wang, Jingdong
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3377 - 3386
  • [8] Audio-Visual Face Reenactment
    Agarwal, Madhav
    Mukhopadhyay, Rudrabha
    Namboodiri, Vinay
    Jawahar, C. V.
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5167 - 5176
  • [9] Audio-visual speech synchrony measure for talking-face identity verification
    Bredin, Herve
    Chollet, Gerard
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 233 - +
  • [10] Mining Audio, Text and Visual Information for Talking Face Generation
    Yu, Lingyun
    Yu, Jun
    Ling, Qiang
    [J]. 2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 787 - 795