FEATURE SPACE VIDEO STREAM CONSISTENCY ESTIMATION FOR DYNAMIC STREAM WEIGHTING IN AUDIO-VISUAL SPEECH RECOGNITION

被引:5
|
作者
Terry, Louis H. [1 ]
Shiell, Derek J. [1 ]
Katsaggelos, Aggelos K. [1 ]
机构
[1] Northwestern Univ, Dept Elect Engn & Comp Sci, Evanston, IL 60208 USA
关键词
Speech Recognition; Hidden Markov Models; Vector Quantization;
D O I
10.1109/ICIP.2008.4712005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most current audio-visual automatic speech recognition (AV-ASR) systems use static weights to leverage between audio and visual information during information fusion. State of the art research has led to using audio reliability metrics for dynamically changing the fusion weights in order to successfully improve overall recognition results. So far, however, incorporating visual reliability metrics into these audio reliability metric based systems have not significantly improved performance. We introduce a new approach to this problem by inferring the "consistency" between the audio and visual information and leveraging the existing audio reliability metrics to create a video reliability metric. Our approach is formulated in the extracted feature space and, thus, does not rely on analyzing the actual video signal itself. The framework presented in this work competes with the audio-only reliability metric based systems and shows promise to consistently outperform.
引用
收藏
页码:1316 / 1319
页数:4
相关论文
共 50 条
  • [1] On Dynamic Stream Weighting for Audio-Visual Speech Recognition
    Estellers, Virginia
    Gurban, Mihai
    Thiran, Jean-Philippe
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1145 - 1157
  • [2] Noise adaptive stream weighting in audio-visual speech recognition
    Heckmann, M
    Berthommier, F
    Kroschel, K
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1260 - 1273
  • [3] Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition
    Martin Heckmann
    Frédéric Berthommier
    Kristian Kroschel
    [J]. EURASIP Journal on Advances in Signal Processing, 2002
  • [4] Dynamic stream weight modeling for audio-visual speech recognition
    Marcheret, Etienne
    Libal, Vit
    Potamianos, Gerasimos
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 945 - +
  • [5] Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment
    Shao, Xu
    Barker, Jon
    [J]. SPEECH COMMUNICATION, 2008, 50 (04) : 337 - 353
  • [6] Rapid feature space speaker adaptation for multi-stream HMM-based audio-visual speech recognition
    Huang, J
    Marcheret, E
    Visweswariah, K
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 338 - 341
  • [7] Multi-stream asynchrony modeling for audio-visual speech recognition
    Lv, Guoyun
    Jiang, Dongmei
    Zhao, Rongchun
    Hou, Yunshu
    [J]. ISM 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2007, : 37 - 44
  • [8] Discriminative training of HMM stream exponents for audio-visual speech recognition
    Potamianos, G
    Graf, HP
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3733 - 3736
  • [9] Asynchronous stream modeling for large vocabulary audio-visual speech recognition
    Luettin, J
    Potamianos, G
    Neti, C
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 169 - 172
  • [10] Improved features and dynamic stream weight adaption for robust Audio-Visual Speech Recognition framework
    Saudi, Ali S.
    Khalil, Mahmoud, I
    Abbas, Hazem M.
    [J]. DIGITAL SIGNAL PROCESSING, 2019, 89 : 17 - 29