FEATURE SPACE VIDEO STREAM CONSISTENCY ESTIMATION FOR DYNAMIC STREAM WEIGHTING IN AUDIO-VISUAL SPEECH RECOGNITION

被引：5

作者：

Terry, Louis H. ^{[1
]}

Shiell, Derek J. ^{[1
]}

Katsaggelos, Aggelos K. ^{[1
]}

机构：

[1] Northwestern Univ, Dept Elect Engn & Comp Sci, Evanston, IL 60208 USA

来源：

2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5 | 2008年

关键词：

Speech Recognition; Hidden Markov Models; Vector Quantization;

D O I：

10.1109/ICIP.2008.4712005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most current audio-visual automatic speech recognition (AV-ASR) systems use static weights to leverage between audio and visual information during information fusion. State of the art research has led to using audio reliability metrics for dynamically changing the fusion weights in order to successfully improve overall recognition results. So far, however, incorporating visual reliability metrics into these audio reliability metric based systems have not significantly improved performance. We introduce a new approach to this problem by inferring the "consistency" between the audio and visual information and leveraging the existing audio reliability metrics to create a video reliability metric. Our approach is formulated in the extracted feature space and, thus, does not rely on analyzing the actual video signal itself. The framework presented in this work competes with the audio-only reliability metric based systems and shows promise to consistently outperform.

引用

页码：1316 / 1319

页数：4

共 50 条

[1] On Dynamic Stream Weighting for Audio-Visual Speech Recognition
Estellers, Virginia
Gurban, Mihai
Thiran, Jean-Philippe
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1145 - 1157
[2] Noise adaptive stream weighting in audio-visual speech recognition
Heckmann, M
Berthommier, F
Kroschel, K
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1260 - 1273
[3] Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition
Martin Heckmann
Frédéric Berthommier
Kristian Kroschel
[J]. EURASIP Journal on Advances in Signal Processing, 2002
[4] Dynamic stream weight modeling for audio-visual speech recognition
Marcheret, Etienne
Libal, Vit
Potamianos, Gerasimos
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 945 - +
[5] Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment
Shao, Xu
Barker, Jon
[J]. SPEECH COMMUNICATION, 2008, 50 (04) : 337 - 353
[6] Rapid feature space speaker adaptation for multi-stream HMM-based audio-visual speech recognition
Huang, J
Marcheret, E
Visweswariah, K
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 338 - 341
[7] Multi-stream asynchrony modeling for audio-visual speech recognition
Lv, Guoyun
Jiang, Dongmei
Zhao, Rongchun
Hou, Yunshu
[J]. ISM 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2007, : 37 - 44
[8] Discriminative training of HMM stream exponents for audio-visual speech recognition
Potamianos, G
Graf, HP
[J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3733 - 3736
[9] Asynchronous stream modeling for large vocabulary audio-visual speech recognition
Luettin, J
Potamianos, G
Neti, C
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 169 - 172
[10] Improved features and dynamic stream weight adaption for robust Audio-Visual Speech Recognition framework
Saudi, Ali S.
Khalil, Mahmoud, I
Abbas, Hazem M.
[J]. DIGITAL SIGNAL PROCESSING, 2019, 89 : 17 - 29

← 1 2 3 4 5 →