Speaker independent audio-visual speech recognition

被引：0

作者：

Zhang, Y ^{[1
]}

Levinson, S ^{[1
]}

Huang, T ^{[1
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA

来源：

2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III | 2000年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a general framework of integrating multimodal sensory signals for spatial temporal pattern recognition. Statistical methods are used to model time varying events in a collaborative manner such that the inter-modal GO-occurrence are taken into account. We discuss various data fusion strategies, modeling of the inter-modal correlations and extracting statistical parameters for multi-modal models. A bimodal speech recognition system is implemented. A speaker-independent experiment is carried out to test the audio-visual speech recognizer under different kinds of noises from a noise database. Consistent improvements of word recognition accuracy (WRA) are achieved using a cross-validation scheme over different signal-to-noise ratios.

引用

页码：1073 / 1076

页数：4

共 50 条

[41] Relevant feature selection for audio-visual speech recognition
Drugman, Thomas
Gurban, Mihai
Thiran, Jean-Philippe
[J]. 2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 179 - +
[42] DEEP MULTIMODAL LEARNING FOR AUDIO-VISUAL SPEECH RECOGNITION
Mroueh, Youssef
Marcheret, Etienne
Goel, Vaibhava
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2130 - 2134
[43] Weighting schemes for audio-visual fusion in speech recognition
Glotin, H
Vergyri, D
Neti, C
Potamianos, G
Luettin, J
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 173 - 176
[44] Multistage information fusion for audio-visual speech recognition
Chu, SM
Libal, V
Marcheret, E
Neti, C
Potamianos, G
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1651 - 1654
[45] Dynamic Bayesian Networks for Audio-Visual Speech Recognition
Ara V. Nefian
Luhong Liang
Xiaobo Pi
Xiaoxing Liu
Kevin Murphy
[J]. EURASIP Journal on Advances in Signal Processing, 2002
[46] Audio-Visual Efficient Conformer for Robust Speech Recognition
Burchi, Maxime
Timofte, Radu
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2257 - 2266
[47] On Dynamic Stream Weighting for Audio-Visual Speech Recognition
Estellers, Virginia
Gurban, Mihai
Thiran, Jean-Philippe
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1145 - 1157
[48] Audio-visual speech recognition using deep learning
Noda, Kuniaki
Yamaguchi, Yuki
Nakadai, Kazuhiro
Okuno, Hiroshi G.
Ogata, Tetsuya
[J]. APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
[49] DAVIS: Driver's Audio-Visual Speech Recognition
Ivanko, Denis
Ryumin, Dmitry
Kashevnik, Alexey
Axyonov, Alexandr
Kitenko, Andrey
Lashkov, Igor
Karpov, Alexey
[J]. INTERSPEECH 2022, 2022, : 1141 - 1142
[50] Towards practical deployment of audio-visual speech recognition
Potamianos, G
Neti, C
Huang, J
Connell, JH
Chu, S
Libal, V
Marcheret, E
Haas, N
Jiang, J
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 777 - 780

← 1 2 3 4 5 →