Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

被引：0

作者：

Zhou, Hang ^{[1
]}

Liu, Yu ^{[1
]}

Liu, Ziwei ^{[1
]}

Luo, Ping ^{[1
]}

Wang, Xiaogang ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval.

引用

页码：9299 / 9306

页数：8

共 50 条

[41] AUDIO-VISUAL CLINICS
GRABER, TM
HANNETT, HA
[J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 1963, 49 (07) : 538 - &
[42] AUDIO-VISUAL DEVELOPMENTS
Schwartz, Mortimer
[J]. JOURNAL OF LEGAL EDUCATION, 1952, 5 (01) : 88 - 95
[43] Audio-Visual Techniques
Sears, William P., Jr.
[J]. EDUCATION, 1948, 69 (02): : 132 - 132
[44] AUDIO-VISUAL POTPOURRI
不详
[J]. INDUSTRIAL PHOTOGRAPHY, 1968, 17 (07): : 30 - &
[45] AUDIO-VISUAL TECHNOLOGIES
TAKESHITA, M
FURUKAWA, M
HAYATSU, R
MURAKAMI, R
SUZUKI, K
HASHIZUME, K
[J]. NEC RESEARCH & DEVELOPMENT, 1990, (96): : 265 - 277
[46] Audio-visual imposture
Karam, Walid
Mokbel, Chafic
Greige, Hanna
Chollet, Gerard
[J]. MOBILE MULTIMEDIA/IMAGE PROCESSING FOR MILITARY AND SECURITY APPLICATIONS, 2006, 6250
[47] AUDIO-VISUAL UNIT
WHARTON, BA
[J]. PEDIATRICS, 1971, 47 (05) : 957 - &
[48] Audio-visual biometrics
Aleksic, Petar S.
Katsaggelos, Aggelos K.
[J]. PROCEEDINGS OF THE IEEE, 2006, 94 (11) : 2025 - 2044
[49] AUDIO-VISUAL FOR THE PATIENT
STUTTLE, FL
[J]. JOURNAL OF BONE AND JOINT SURGERY-AMERICAN VOLUME, 1959, 41 (07): : 1362 - 1362
[50] The Audio-Visual Reader
不详
[J]. JOURNAL OF EDUCATIONAL RESEARCH, 1955, 48 (07): : 552 - 553

← 1 2 3 4 5 →