DYNAMIC TEMPORAL ALIGNMENT OF SPEECH TO LIPS

被引:0
|
作者
Halperin, Tavi [1 ]
Ephrat, Ariel [2 ,3 ]
Peleg, Shmuel [1 ]
机构
[1] Hebrew Univ Jerusalem, Jerusalem, Israel
[2] Google Res, Mountain View, CA USA
[3] HUJI, Jerusalem, Israel
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
Automatic Dialogue Replacement;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many speech segments in movies are re-recorded in a studio during post-production, to compensate for poor sound quality as recorded on location. We present an audio-to-video method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements. This alignment is based on deep audio-visual features, mapping the lips video and the speech signal to a shared representation. Using this representation we compute the lip-sync error between every short speech period and every video frame, followed by the determination of the optimal corresponding frame for each short sound period over the entire video clip. We demonstrate successful alignment both quantitatively, using a human perception-inspired metric, as well as qualitatively. The strongest advantage of our audio-to-video approach is in cases where the original voice in unclear. In these cases state-of-the-art audio only methods will fail.
引用
收藏
页码:3980 / 3984
页数:5
相关论文
共 50 条
  • [31] Interarticulatory Coordination of the Lips and Jaw in Childhood Apraxia of Speech
    Moss, Aviva
    Grigos, Maria I.
    JOURNAL OF MEDICAL SPEECH-LANGUAGE PATHOLOGY, 2012, 20 (04) : 127 - 132
  • [32] LIPS2008: Visual Speech Synthesis Challenge
    Theobald, Barry-John
    Fagel, Sascha
    Bailly, Gerard
    Elisei, Frederic
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2310 - +
  • [33] MOVEMENTS OF THE UPPER AND LOWER LIPS DURING SPEECH - INTERACTIONS BETWEEN LIPS WITH THE JAW FIXED AT DIFFERENT POSITIONS
    FOLKINS, JW
    CANTY, JL
    JOURNAL OF SPEECH AND HEARING RESEARCH, 1986, 29 (03): : 348 - 356
  • [34] Dynamic Temporal and Tactile Cueing: A Treatment Strategy for Childhood Apraxia of Speech
    Strand, Edythe A.
    AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY, 2020, 29 (01) : 30 - 48
  • [35] Analysis through Dynamic Temporal Sequence Alignment in SpO2 Signals
    Molina, Valentin
    Cuadra, Manuel
    Martinez, Luis J.
    Robles, Horderlin V.
    TECCIENCIA, 2016, 11 (21) : 39 - 43
  • [36] Lips Detection for Audio-Visual Speech Recognition System
    Chin, Siew Wen
    Ang, Li-Minn
    Seng, Kah Phooi
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS 2008), 2008, : 311 - 314
  • [37] Finding lips in unconstrained imagery for improved automatic speech recognition
    Zhang, Xiaozheng Jane
    Montoya, Higinio Ariel
    Crow, Brandon
    ADVANCES IN VISUAL INFORMATION SYSTEMS, 2007, 4781 : 185 - 192
  • [38] Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
    Zhao, Ya
    Xu, Rui
    Wang, Xinchao
    Hou, Peng
    Tang, Haihong
    Song, Mingli
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6917 - 6924
  • [39] Watch these lips:: Motion segmentation of human speech and voice organs
    Friedl, S
    Zink, W
    Fröba, B
    Wittenberg, T
    VISION MODELING, AND VISUALIZATION 2002, PROCEEDINGS, 2002, : 439 - 446
  • [40] FATALRead - Fooling visual speech recognition modelsPut words on Lips
    Anup Kumar Gupta
    Puneet Gupta
    Esa Rahtu
    Applied Intelligence, 2022, 52 : 9001 - 9016