Rapid feature space speaker adaptation for multi-stream HMM-based audio-visual speech recognition

被引:0
|
作者
Huang, J [1 ]
Marcheret, E [1 ]
Visweswariah, K [1 ]
机构
[1] IBM Corp, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-stream hidden Markov models (HMMs) have recently been very successful in audio-visual speech recognition, where the audio and visual streams are fused at the final decision level. In this paper we investigate fast feature space speaker adaptation using multi-strearn HMMs for audio-visual speech recognition. In particular, we focus on studying the performance of feature-space maximum likelihood linear regression (fMLLR), a fast and effective method for estimating feature space transforms. Unlike the common speaker adaptation techniques of MAP or MLLR, fMLLR does not change the audio or visual HMM parameters, but simply applies a single transform to the testing features. We also address the problem of fast and robust on-line fMLLR adaptation using feature space maximum a posterior linear regression (fMAPLR). Adaptation experiments are reported on the IBM infrared headset audio-visual database. On average for a 20-speaker 1 hour independent test set, the multi-stream fMLLR achieves 31% relative gain on the clean audio condition, and 59% relative gain on the noisy audio condition (approximately 7dB) as compared to the baseline multi-stream system.
引用
收藏
页码:338 / 341
页数:4
相关论文
共 50 条
  • [41] An On-line Speaker Adaptation Method for HMM-based Speech Recognizers
    Banhalmi, Andras
    Kocsor, Andras
    [J]. ACTA CYBERNETICA, 2008, 18 (03): : 379 - 390
  • [42] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Wu, Yi-Jian
    King, Simon
    Tokuda, Keiichi
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
  • [43] Noise adaptive stream weighting in audio-visual speech recognition
    Heckmann, M
    Berthommier, F
    Kroschel, K
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1260 - 1273
  • [44] Dynamic stream weight modeling for audio-visual speech recognition
    Marcheret, Etienne
    Libal, Vit
    Potamianos, Gerasimos
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 945 - +
  • [45] Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition
    Martin Heckmann
    Frédéric Berthommier
    Kristian Kroschel
    [J]. EURASIP Journal on Advances in Signal Processing, 2002
  • [46] A HMM-based integrated method for speaker-independent speech recognition
    Zhang, YY
    Zhu, XY
    [J]. ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 613 - 616
  • [47] TWIN-HMM-BASED AUDIO-VISUAL SPEECH ENHANCEMENT
    Abdelaziz, Ahmed Hussen
    Zeiler, Steffen
    Kolossa, Dorothea
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3726 - 3730
  • [48] Audio-Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature
    Komai, Yuto
    Ariki, Yasuo
    Takiguchi, Tetsuya
    [J]. ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PT I, 2011, 7087 : 97 - 108
  • [49] Automatic Visual Feature Extraction for Mandarin Audio-Visual Speech Recognition
    Pao, Tsang-Long
    Liao, Wen-Yuan
    Wu, Tsan-Nung
    Lin, Ching-Yi
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 2936 - 2940
  • [50] Feature pruning in likelihood evaluation of HMM-based speech recognition
    Li, X
    Bilmes, J
    [J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 303 - 308