Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

被引:0
|
作者
Zhou, Hang [1 ]
Liu, Yu [1 ]
Liu, Ziwei [1 ]
Luo, Ping [1 ]
Wang, Xiaogang [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval.
引用
收藏
页码:9299 / 9306
页数:8
相关论文
共 50 条
  • [41] AUDIO-VISUAL CLINICS
    GRABER, TM
    HANNETT, HA
    [J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 1963, 49 (07) : 538 - &
  • [42] AUDIO-VISUAL DEVELOPMENTS
    Schwartz, Mortimer
    [J]. JOURNAL OF LEGAL EDUCATION, 1952, 5 (01) : 88 - 95
  • [43] Audio-Visual Techniques
    Sears, William P., Jr.
    [J]. EDUCATION, 1948, 69 (02): : 132 - 132
  • [44] AUDIO-VISUAL POTPOURRI
    不详
    [J]. INDUSTRIAL PHOTOGRAPHY, 1968, 17 (07): : 30 - &
  • [45] AUDIO-VISUAL TECHNOLOGIES
    TAKESHITA, M
    FURUKAWA, M
    HAYATSU, R
    MURAKAMI, R
    SUZUKI, K
    HASHIZUME, K
    [J]. NEC RESEARCH & DEVELOPMENT, 1990, (96): : 265 - 277
  • [46] Audio-visual imposture
    Karam, Walid
    Mokbel, Chafic
    Greige, Hanna
    Chollet, Gerard
    [J]. MOBILE MULTIMEDIA/IMAGE PROCESSING FOR MILITARY AND SECURITY APPLICATIONS, 2006, 6250
  • [47] AUDIO-VISUAL UNIT
    WHARTON, BA
    [J]. PEDIATRICS, 1971, 47 (05) : 957 - &
  • [48] Audio-visual biometrics
    Aleksic, Petar S.
    Katsaggelos, Aggelos K.
    [J]. PROCEEDINGS OF THE IEEE, 2006, 94 (11) : 2025 - 2044
  • [49] AUDIO-VISUAL FOR THE PATIENT
    STUTTLE, FL
    [J]. JOURNAL OF BONE AND JOINT SURGERY-AMERICAN VOLUME, 1959, 41 (07): : 1362 - 1362
  • [50] The Audio-Visual Reader
    不详
    [J]. JOURNAL OF EDUCATIONAL RESEARCH, 1955, 48 (07): : 552 - 553