Matching training and test data distributions for robust speech recognition

被引:15
|
作者
Molau, S [1 ]
Keysers, D [1 ]
Ney, H [1 ]
机构
[1] Univ Technol, Rhein Westfal TH Aachen, Dept Comp Sci, Lehrstuhl Informat 6, D-52056 Aachen, Germany
关键词
normalization; feature transformation; feature extraction; noise robustness; histogram normalization; feature space rotation;
D O I
10.1016/S0167-6393(03)00085-2
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work normalization techniques in the acoustic feature space are studied that aim at reducing the mismatch between training and test by matching their distributions. Histogram normalization is the first technique explored in detail. The effect of normalization at different signal analysis stages as well as training and test data normalization are investigated. The basic normalization approach is improved by taking care of the variable silence fraction. Feature space rotation is the second technique that is introduced. It accounts for undesired variations in the acoustic signal that are correlated in the feature space dimensions. The interaction of rotation and histogram normalization is analyzed and it is shown that the recognition accuracy is significantly improved by both techniques on corpora with different complexity, acoustic conditions, and speaking styles. The word error rate is reduced from 24.6% to 21.8% on VerbMobil II, a German large vocabulary conversational speech task, and from 16.5% to 15.5% on EuTrans II, an Italian speech corpus of conversational speech over telephone. On the CarNavigation task, a German isolated-word corpus recorded partly in noisy car environments, the word error rate is reduced from 74.2% to 11.1% for heavy mismatch conditions between training and test. (C) 2003 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:579 / 601
页数:23
相关论文
共 50 条
  • [31] EXPLOITING MULTIMODAL DATA FUSION IN ROBUST SPEECH RECOGNITION
    Heracleous, Panikos
    Badin, Pierre
    Bailly, Gerard
    Hagita, Norihiro
    2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 568 - 572
  • [32] Robust Speech Recognition under Noisy Environment using Speech Rate Training System
    Dhas, Edwin D.
    Ruban, Bency L.
    King, Arul J.
    2012 THIRD INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION & NETWORKING TECHNOLOGIES (ICCCNT), 2012,
  • [33] Data Augmentation and Teacher-Student Training for LF-MMI Based Robust Speech Recognition
    Asadullah
    Alumae, Tanel
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 403 - 410
  • [34] Investigating the Impact of the Training Data Volume for Robust Speech Recognition using Multi-Task Learning
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2017 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2017, : 382 - 387
  • [35] Joint Training of Speech Separation, Filterbank and Acoustic Model for Robust Automatic Speech Recognition
    Wang, Zhong-Qiu
    Wang, DeLiang
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2839 - 2843
  • [36] Contaminated speech training methods for robust DNN-HMM distant speech recognition
    Ravanelli, Mirco
    Omologo, Maurizio
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 756 - 760
  • [37] ActMAD: Activation Matching to Align Distributions for Test-Time-Training
    Mirza, M. Jehanzeb
    Soneira, Pol Jane
    Lin, Wei
    Kozinski, Mateusz
    Possegger, Horst
    Bischof, Horst
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24152 - 24161
  • [38] BOOKS ON TAPE AS TRAINING DATA FOR CONTINUOUS SPEECH RECOGNITION
    BOULIANNE, G
    KENNY, P
    LENNIG, M
    OSHAUGHNESSY, D
    MERMELSTEIN, P
    SPEECH COMMUNICATION, 1994, 14 (01) : 61 - 70
  • [39] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [40] pMCT: Patched Multi-Condition Training for Robust Speech Recognition
    Parada, Pablo Peso
    Dobrowolska, Agnieszka
    Saravanan, Karthikeyan
    Ozay, Mete
    INTERSPEECH 2022, 2022, : 3779 - 3783