CONTINUOUS VISUAL SPEECH RECOGNITION FOR AUDIO SPEECH ENHANCEMENT

被引：0

作者：

Benhaim, Eric ^{[1
,2
]}

Sahbi, Hichem ^{[1
]}

Vitte, Guillaume ^{[2
]}

机构：

[1] CNRS LTCI, Telecom ParisTech, 46 Rue Barrault, F-75013 Paris, France

[2] Parrot SA, F-75010 Paris, France

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

Visual speech recognition; probabilistic graphical model; belief propagation; model-based speech enhancement;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We introduce in this paper a novel non-blind speech enhancement procedure based on visual speech recognition (VSR). The latter is based on a generative process that analyzes sequences of talking faces and classifies them into visual speech units known as visemes. We use an effective graphical model able to segment and label a given sequence of talking faces into a sequence of visemes. Our model captures unary potential as well as pairwise interaction; the former models visual appearance of speech units while the latter models their interactions using boundary and visual language model activations. Experiments conducted on a standard challenging dataset, show that when feeding the results of VSR to the speech enhancement procedure, it clearly outperforms baseline blind methods as well as related work.

引用

页码：2244 / 2248

页数：5

共 50 条

[1] Audio-Visual Speech Modeling for Continuous Speech Recognition
Dupont, Stephane
Luettin, Juergen
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
[2] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
[J]. APPLIED ACOUSTICS, 2023, 211
[3] Speaker independent audio-visual continuous speech recognition
Liang, LH
Liu, XX
Zhao, YB
Pi, XB
Nefian, AV
[J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A25 - A28
[4] Turbo Decoders for Audio-visual Continuous Speech Recognition
Abdelaziz, Ahmed Hussen
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3667 - 3671
[5] Large Vocabulary Continuous Audio-Visual Speech Recognition
Sterpu, George
[J]. ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 538 - 541
[6] Speech enhancement and recognition in meetings with an audio-visual sensor array
Maganti, Hari Krishna
Gatica-Perez, Daniel
McCowan, Iain
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2257 - 2269
[7] Fusing data streams in continuous audio-visual speech recognition
Rothkrantz, LJM
Wojdel, JC
Wiggers, P
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 33 - 44
[8] The Conversation: Deep Audio -Visual Speech Enhancement
Afouras, Triantafyllos
Chung, Joon Son
Zisserman, Andrew
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3244 - 3248
[9] Lite Audio-Visual Speech Enhancement
Chuang, Shang-Yi
Tsao, Yu
Lo, Chen-Chou
Wang, Hsin-Min
[J]. INTERSPEECH 2020, 2020, : 1131 - 1135
[10] Audio-visual enhancement of speech in noise
Girin, L
Schwartz, JL
Feng, G
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (06): : 3007 - 3020

← 1 2 3 4 5 →