CONTINUOUS VISUAL SPEECH RECOGNITION FOR AUDIO SPEECH ENHANCEMENT

被引:0
|
作者
Benhaim, Eric [1 ,2 ]
Sahbi, Hichem [1 ]
Vitte, Guillaume [2 ]
机构
[1] CNRS LTCI, Telecom ParisTech, 46 Rue Barrault, F-75013 Paris, France
[2] Parrot SA, F-75010 Paris, France
关键词
Visual speech recognition; probabilistic graphical model; belief propagation; model-based speech enhancement;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce in this paper a novel non-blind speech enhancement procedure based on visual speech recognition (VSR). The latter is based on a generative process that analyzes sequences of talking faces and classifies them into visual speech units known as visemes. We use an effective graphical model able to segment and label a given sequence of talking faces into a sequence of visemes. Our model captures unary potential as well as pairwise interaction; the former models visual appearance of speech units while the latter models their interactions using boundary and visual language model activations. Experiments conducted on a standard challenging dataset, show that when feeding the results of VSR to the speech enhancement procedure, it clearly outperforms baseline blind methods as well as related work.
引用
收藏
页码:2244 / 2248
页数:5
相关论文
共 50 条
  • [1] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [2] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [3] Speaker independent audio-visual continuous speech recognition
    Liang, LH
    Liu, XX
    Zhao, YB
    Pi, XB
    Nefian, AV
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A25 - A28
  • [4] Turbo Decoders for Audio-visual Continuous Speech Recognition
    Abdelaziz, Ahmed Hussen
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3667 - 3671
  • [5] Large Vocabulary Continuous Audio-Visual Speech Recognition
    Sterpu, George
    [J]. ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 538 - 541
  • [6] Speech enhancement and recognition in meetings with an audio-visual sensor array
    Maganti, Hari Krishna
    Gatica-Perez, Daniel
    McCowan, Iain
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2257 - 2269
  • [7] Fusing data streams in continuous audio-visual speech recognition
    Rothkrantz, LJM
    Wojdel, JC
    Wiggers, P
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 33 - 44
  • [8] The Conversation: Deep Audio -Visual Speech Enhancement
    Afouras, Triantafyllos
    Chung, Joon Son
    Zisserman, Andrew
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3244 - 3248
  • [9] Lite Audio-Visual Speech Enhancement
    Chuang, Shang-Yi
    Tsao, Yu
    Lo, Chen-Chou
    Wang, Hsin-Min
    [J]. INTERSPEECH 2020, 2020, : 1131 - 1135
  • [10] Audio-visual enhancement of speech in noise
    Girin, L
    Schwartz, JL
    Feng, G
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (06): : 3007 - 3020