DYNAMIC TEMPORAL ALIGNMENT OF SPEECH TO LIPS

被引:0
|
作者
Halperin, Tavi [1 ]
Ephrat, Ariel [2 ,3 ]
Peleg, Shmuel [1 ]
机构
[1] Hebrew Univ Jerusalem, Jerusalem, Israel
[2] Google Res, Mountain View, CA USA
[3] HUJI, Jerusalem, Israel
关键词
Automatic Dialogue Replacement;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many speech segments in movies are re-recorded in a studio during post-production, to compensate for poor sound quality as recorded on location. We present an audio-to-video method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements. This alignment is based on deep audio-visual features, mapping the lips video and the speech signal to a shared representation. Using this representation we compute the lip-sync error between every short speech period and every video frame, followed by the determination of the optimal corresponding frame for each short sound period over the entire video clip. We demonstrate successful alignment both quantitatively, using a human perception-inspired metric, as well as qualitatively. The strongest advantage of our audio-to-video approach is in cases where the original voice in unclear. In these cases state-of-the-art audio only methods will fail.
引用
收藏
页码:3980 / 3984
页数:5
相关论文
共 50 条
  • [1] Analysis of Speech and Singing Signals for Temporal Alignment
    Vijayan, Karthika
    Gao, Xiaoxue
    Li, Haizhou
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1893 - 1898
  • [2] ON TEMPORAL ALIGNMENT OF SENTENCES OF NATURAL AND SYNTHETIC SPEECH
    HOHNE, HD
    COKER, C
    LEVINSON, SE
    RABINER, LR
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1983, 31 (04): : 807 - 813
  • [3] Quantifying temporal speech reduction in French using forced speech alignment
    Adda-Decker, Martine
    Snoeren, Natalie D.
    JOURNAL OF PHONETICS, 2011, 39 (03) : 261 - 270
  • [4] A Dynamic Alignment Algorithm for Imperfect Speech and Transcript
    Tao, Ye
    Li, Xueqing
    Wu, Bian
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2010, 7 (01) : 75 - 84
  • [5] Speech perception and its temporal dynamic
    Specht, K
    Shah, NJ
    Jäncke, L
    NEUROIMAGE, 2001, 13 (06) : S609 - S609
  • [7] On including temporal constraints in Viterbi alignment for speech recognition in noise
    Yoma, NB
    McInnes, FR
    Jack, MA
    Stump, SD
    Ling, LL
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (02): : 179 - 182
  • [8] Prosodic Temporal Alignment of Co-speech Gestures to Speech Facilitates Referent Resolution
    Jesse, Alexandra
    Johnson, Elizabeth K.
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2012, 38 (06) : 1567 - 1581
  • [9] Inner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech
    Liu, Li
    Feng, Gang
    Beautemps, Denis
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2017,
  • [10] Inner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech
    Li Liu
    Gang Feng
    Denis Beautemps
    EURASIP Journal on Image and Video Processing, 2017