DYNAMIC TEMPORAL ALIGNMENT OF SPEECH TO LIPS

被引:0
|
作者
Halperin, Tavi [1 ]
Ephrat, Ariel [2 ,3 ]
Peleg, Shmuel [1 ]
机构
[1] Hebrew Univ Jerusalem, Jerusalem, Israel
[2] Google Res, Mountain View, CA USA
[3] HUJI, Jerusalem, Israel
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
Automatic Dialogue Replacement;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many speech segments in movies are re-recorded in a studio during post-production, to compensate for poor sound quality as recorded on location. We present an audio-to-video method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements. This alignment is based on deep audio-visual features, mapping the lips video and the speech signal to a shared representation. Using this representation we compute the lip-sync error between every short speech period and every video frame, followed by the determination of the optimal corresponding frame for each short sound period over the entire video clip. We demonstrate successful alignment both quantitatively, using a human perception-inspired metric, as well as qualitatively. The strongest advantage of our audio-to-video approach is in cases where the original voice in unclear. In these cases state-of-the-art audio only methods will fail.
引用
收藏
页码:3980 / 3984
页数:5
相关论文
共 50 条
  • [41] The Fruit of Our Lips: The Transformation of the Word of God into the Speech of Mankind
    Cristaudo, Wayne
    EUROPEAN LEGACY-TOWARD NEW PARADIGMS, 2024, 29 (01): : 79 - 87
  • [42] Hyperarticulation in Lombard speech: Global coordination of the jaw, lips and the tongue
    Simko, Juraj
    Benus, Stefan
    Vainio, Martti
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 139 (01): : 151 - 162
  • [43] Alignment to visual speech information
    Rachel M. Miller
    Kauyumari Sanchez
    Lawrence D. Rosenblum
    Attention, Perception, & Psychophysics, 2010, 72 : 1614 - 1625
  • [44] Slow of Speech and Unclean Lips: Contemporary Images of Preaching Identity
    Graves, Mike
    REVIEW & EXPOSITOR, 2014, 111 (01) : 85 - 86
  • [45] Alignment to visual speech information
    Miller, Rachel M.
    Sanchez, Kauyumari
    Rosenblum, Lawrence D.
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2010, 72 (06) : 1614 - 1625
  • [46] Shadowing reduced speech and alignment
    Brouwer, Susanne
    Mitterer, Holger
    Huettig, Falk
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (01): : EL32 - EL37
  • [47] On the Automatic Validation of Speech Alignment
    Athanasopoulos, Georgios
    Macq, Benoit
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2105 - 2109
  • [48] Is speech alignment to talkers or tasks?
    Miller, Rachel M.
    Sanchez, Kauyumari
    Rosenblum, Lawrence D.
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2013, 75 (08) : 1817 - 1826
  • [49] Is speech alignment to talkers or tasks?
    Rachel M. Miller
    Kauyumari Sanchez
    Lawrence D. Rosenblum
    Attention, Perception, & Psychophysics, 2013, 75 : 1817 - 1826
  • [50] Multimodal Development in Childrens Narrative Speech: Evidence for Tight Gesture-Speech Temporal Alignment Patterns as Early as 5 Years Old
    Florit-Pons, Julia
    Vila-Gimenez, Ingrid
    Louis Rohrer, Patrick
    Prieto, Pilar
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2023, 66 (03): : 888 - 900