Matching training and test data distributions for robust speech recognition

被引:15
|
作者
Molau, S [1 ]
Keysers, D [1 ]
Ney, H [1 ]
机构
[1] Univ Technol, Rhein Westfal TH Aachen, Dept Comp Sci, Lehrstuhl Informat 6, D-52056 Aachen, Germany
关键词
normalization; feature transformation; feature extraction; noise robustness; histogram normalization; feature space rotation;
D O I
10.1016/S0167-6393(03)00085-2
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work normalization techniques in the acoustic feature space are studied that aim at reducing the mismatch between training and test by matching their distributions. Histogram normalization is the first technique explored in detail. The effect of normalization at different signal analysis stages as well as training and test data normalization are investigated. The basic normalization approach is improved by taking care of the variable silence fraction. Feature space rotation is the second technique that is introduced. It accounts for undesired variations in the acoustic signal that are correlated in the feature space dimensions. The interaction of rotation and histogram normalization is analyzed and it is shown that the recognition accuracy is significantly improved by both techniques on corpora with different complexity, acoustic conditions, and speaking styles. The word error rate is reduced from 24.6% to 21.8% on VerbMobil II, a German large vocabulary conversational speech task, and from 16.5% to 15.5% on EuTrans II, an Italian speech corpus of conversational speech over telephone. On the CarNavigation task, a German isolated-word corpus recorded partly in noisy car environments, the word error rate is reduced from 74.2% to 11.1% for heavy mismatch conditions between training and test. (C) 2003 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:579 / 601
页数:23
相关论文
共 50 条
  • [41] A robust speech analysis in speech recognition
    Miyanaga, Y
    Gozen, S
    Ohtsuki, N
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 706 - 709
  • [42] Discriminative training of auditory filters of different shapes for robust speech recognition
    Mak, B
    Tam, YC
    Hsiao, R
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 45 - 48
  • [43] Joint Adaptation and Adaptive Training of TVWR for Robust Automatic Speech Recognition
    Liu, Shilin
    Sim, Khe Chai
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 636 - 640
  • [44] On a Generalization of Margin-Based Discriminative Training to Robust Speech Recognition
    Li, Jinyu
    Lee, Chin-Hui
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1992 - 1995
  • [45] A Simulated-Data Adaptation Technique for Robust Speech Recognition
    Thatphithakkul, Nattanun
    Kruatrachue, Boontee
    Wutiwiwatchai, Chai
    Marukatat, Sanparith
    Boonpiam, Vataya
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 777 - +
  • [46] Robust automatic speech recognition with missing and unreliable acoustic data
    Cooke, M
    Green, P
    Josifovski, L
    Vizinho, A
    SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
  • [47] Robust speech recognition with selective input data to a NN classifier
    Cong, L
    Asghar, S
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 1817 - 1824
  • [48] Bounded cepstral marginalization of missing data for robust speech recognition
    Kafoori, Kian Ebrahim
    Ahadi, Seyed Mohammad
    COMPUTER SPEECH AND LANGUAGE, 2016, 36 : 1 - 23
  • [49] Robust Speech Recognition in the presence of noise using medical data
    Athanaselis, Theologos
    Bakamidis, Stelios
    Giannopoulos, George
    Dologlou, Ioannis
    Fotinea, Evita
    2008 IEEE INTERNATIONAL WORKSHOP ON IMAGING SYSTEMS AND TECHNIQUES, 2008, : 347 - 350
  • [50] DAT: Training Deep Networks Robust to Label-Noise by Matching the Feature Distributions
    Qu, Yuntao
    Mo, Shasha
    Niu, Jianwei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6817 - 6825