Matching training and test data distributions for robust speech recognition

被引:15
|
作者
Molau, S [1 ]
Keysers, D [1 ]
Ney, H [1 ]
机构
[1] Univ Technol, Rhein Westfal TH Aachen, Dept Comp Sci, Lehrstuhl Informat 6, D-52056 Aachen, Germany
关键词
normalization; feature transformation; feature extraction; noise robustness; histogram normalization; feature space rotation;
D O I
10.1016/S0167-6393(03)00085-2
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work normalization techniques in the acoustic feature space are studied that aim at reducing the mismatch between training and test by matching their distributions. Histogram normalization is the first technique explored in detail. The effect of normalization at different signal analysis stages as well as training and test data normalization are investigated. The basic normalization approach is improved by taking care of the variable silence fraction. Feature space rotation is the second technique that is introduced. It accounts for undesired variations in the acoustic signal that are correlated in the feature space dimensions. The interaction of rotation and histogram normalization is analyzed and it is shown that the recognition accuracy is significantly improved by both techniques on corpora with different complexity, acoustic conditions, and speaking styles. The word error rate is reduced from 24.6% to 21.8% on VerbMobil II, a German large vocabulary conversational speech task, and from 16.5% to 15.5% on EuTrans II, an Italian speech corpus of conversational speech over telephone. On the CarNavigation task, a German isolated-word corpus recorded partly in noisy car environments, the word error rate is reduced from 74.2% to 11.1% for heavy mismatch conditions between training and test. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:579 / 601
页数:23
相关论文
共 50 条
  • [1] Stochastic Matching for Robust Speech Recognition
    Sankar, Ananth
    Lee, Chin-Hui
    [J]. IEEE SIGNAL PROCESSING LETTERS, 1994, 1 (08) : 124 - 125
  • [2] Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors
    Wang, Longshaokan
    Fazel-zarandi, Maryam
    Tiwari, Aditya
    Matsoukas, Spyros
    Polymenakos, Lazaros
    [J]. NLP FOR CONVERSATIONAL AI, 2020, : 63 - 70
  • [3] Analysis of the effect of training and test data on the performance of speech recognition systems
    Wang, Xiangdong
    Xie, Feng
    Lin, Shouxun
    Qian, Yueliang
    Liu, Qun
    [J]. 2007 2ND INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND APPLICATIONS, VOLS 1 AND 2, 2007, : 106 - 110
  • [4] Nonlinear statistical matching for subband robust speech recognition
    Dept. of Radio Engineering, Southeast University, Nanjing 210096, China
    [J]. Dianzi Yu Xinxi Xuebao, 2006, 3 (480-484):
  • [5] Hierarchical stochastic feature matching for robust speech recognition
    Jiang, H
    Soong, F
    Lee, CH
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 217 - 220
  • [6] In domain training data augmentation on noise robust Punjabi Children speech recognition
    Virender Kadyan
    Puneet Bawa
    Taniya Hasija
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 2705 - 2721
  • [7] In domain training data augmentation on noise robust Punjabi Children speech recognition
    Kadyan, Virender
    Bawa, Puneet
    Hasija, Taniya
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 13 (5) : 2705 - 2721
  • [8] High-performance robust speech recognition using stereo training data
    Deng, L
    Acero, A
    Jiang, L
    Droppo, J
    Huang, XD
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 301 - 304
  • [9] A robust training algorithm for adverse speech recognition
    Hong, WT
    Chen, SH
    [J]. SPEECH COMMUNICATION, 2000, 30 (04) : 273 - 293
  • [10] AUTOMATIC OPTIMIZATION OF DATA PERTURBATION DISTRIBUTIONS FOR MULTI-STYLE TRAINING IN SPEECH RECOGNITION
    Doulaty, Mortaza
    Rose, Richard
    Siohan, Olivier
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 21 - 27