Matching training and test data distributions for robust speech recognition

被引:15
|
作者
Molau, S [1 ]
Keysers, D [1 ]
Ney, H [1 ]
机构
[1] Univ Technol, Rhein Westfal TH Aachen, Dept Comp Sci, Lehrstuhl Informat 6, D-52056 Aachen, Germany
关键词
normalization; feature transformation; feature extraction; noise robustness; histogram normalization; feature space rotation;
D O I
10.1016/S0167-6393(03)00085-2
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work normalization techniques in the acoustic feature space are studied that aim at reducing the mismatch between training and test by matching their distributions. Histogram normalization is the first technique explored in detail. The effect of normalization at different signal analysis stages as well as training and test data normalization are investigated. The basic normalization approach is improved by taking care of the variable silence fraction. Feature space rotation is the second technique that is introduced. It accounts for undesired variations in the acoustic signal that are correlated in the feature space dimensions. The interaction of rotation and histogram normalization is analyzed and it is shown that the recognition accuracy is significantly improved by both techniques on corpora with different complexity, acoustic conditions, and speaking styles. The word error rate is reduced from 24.6% to 21.8% on VerbMobil II, a German large vocabulary conversational speech task, and from 16.5% to 15.5% on EuTrans II, an Italian speech corpus of conversational speech over telephone. On the CarNavigation task, a German isolated-word corpus recorded partly in noisy car environments, the word error rate is reduced from 74.2% to 11.1% for heavy mismatch conditions between training and test. (C) 2003 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:579 / 601
页数:23
相关论文
共 50 条
  • [21] Validation of Speech Data for Training Automatic Speech Recognition Systems
    Krizaj, Janes
    Gros, Jerneja Zganec
    Dobrisek, Simon
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1165 - 1169
  • [22] Robust Recognition of Conversational Telephone Speech via Multi-condition Training and Data Augmentation
    Malek, Jiri
    Zdansky, Jindrich
    Cerva, Petr
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 324 - 333
  • [23] Limited training data robust speech recognition using kernel-based acoustic models
    Schaffoener, Martin
    Krueger, Sven E.
    Andelic, Edin
    Katz, Marcel
    Wendemuth, Andreas
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1137 - 1140
  • [24] A mismatch-aware stochastic matching algorithm for robust speech recognition
    Liao, YF
    Lin, JS
    Chen, JH
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 101 - 104
  • [25] Maximum-likelihood approach to stochastic matching for robust speech recognition
    Sankar, A
    Lee, CH
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (03): : 190 - 202
  • [26] Modeling of the PSTN Channel and multireferences training in robust speech recognition
    Preiss, R.
    2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 2417 - 2420
  • [27] JOINT NOISE ADAPTIVE TRAINING FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Narayanan, Arun
    Wang, DeLiang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [28] A Global Discriminant Joint Training Framework for Robust Speech Recognition
    Li, Lujun
    Kuerzinger, Ludwig
    Watzel, Tobias
    Rigoll, Gerhard
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 544 - 551
  • [29] Modeling of the PSTN channel and multireferences training in robust speech recognition
    Preiss, R.
    Gabrea, M.
    2006 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, VOLS 1-7, 2006, : 589 - +
  • [30] Robust Submodular Data Partitioning for Distributed Speech Recognition
    Qi, Jun
    Tejedor, Javier
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2254 - 2258