DYNAMIC TEMPORAL ALIGNMENT OF SPEECH TO LIPS

被引：0

作者：

Halperin, Tavi ^{[1
]}

Ephrat, Ariel ^{[2
,3
]}

Peleg, Shmuel ^{[1
]}

机构：

[1] Hebrew Univ Jerusalem, Jerusalem, Israel

[2] Google Res, Mountain View, CA USA

[3] HUJI, Jerusalem, Israel

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

Automatic Dialogue Replacement;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Many speech segments in movies are re-recorded in a studio during post-production, to compensate for poor sound quality as recorded on location. We present an audio-to-video method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements. This alignment is based on deep audio-visual features, mapping the lips video and the speech signal to a shared representation. Using this representation we compute the lip-sync error between every short speech period and every video frame, followed by the determination of the optimal corresponding frame for each short sound period over the entire video clip. We demonstrate successful alignment both quantitatively, using a human perception-inspired metric, as well as qualitatively. The strongest advantage of our audio-to-video approach is in cases where the original voice in unclear. In these cases state-of-the-art audio only methods will fail.

引用

页码：3980 / 3984

页数：5

共 50 条

[1] Analysis of Speech and Singing Signals for Temporal Alignment
Vijayan, Karthika
Gao, Xiaoxue
Li, Haizhou
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1893 - 1898
[2] ON TEMPORAL ALIGNMENT OF SENTENCES OF NATURAL AND SYNTHETIC SPEECH
HOHNE, HD
COKER, C
LEVINSON, SE
RABINER, LR
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1983, 31 (04): : 807 - 813
[3] Quantifying temporal speech reduction in French using forced speech alignment
Adda-Decker, Martine
Snoeren, Natalie D.
JOURNAL OF PHONETICS, 2011, 39 (03) : 261 - 270
[4] A Dynamic Alignment Algorithm for Imperfect Speech and Transcript
Tao, Ye
Li, Xueqing
Wu, Bian
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2010, 7 (01) : 75 - 84
[5] Speech perception and its temporal dynamic
Specht, K
Shah, NJ
Jäncke, L
NEUROIMAGE, 2001, 13 (06) : S609 - S609
[6] Monkey lips smack of speech
Nature, 2012, 486 (7401) : 9 - 9
[7] On including temporal constraints in Viterbi alignment for speech recognition in noise
Yoma, NB
McInnes, FR
Jack, MA
Stump, SD
Ling, LL
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (02): : 179 - 182
[8] Prosodic Temporal Alignment of Co-speech Gestures to Speech Facilitates Referent Resolution
Jesse, Alexandra
Johnson, Elizabeth K.
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2012, 38 (06) : 1567 - 1581
[9] Inner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech
Liu, Li
Feng, Gang
Beautemps, Denis
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2017,
[10] Inner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech
Li Liu
Gang Feng
Denis Beautemps
EURASIP Journal on Image and Video Processing, 2017

← 1 2 3 4 5 →