Matching training and test data distributions for robust speech recognition

被引：15

作者：

Molau, S ^{[1
]}

Keysers, D ^{[1
]}

Ney, H ^{[1
]}

机构：

[1] Univ Technol, Rhein Westfal TH Aachen, Dept Comp Sci, Lehrstuhl Informat 6, D-52056 Aachen, Germany

来源：

SPEECH COMMUNICATION | 2003年 / 41卷 / 04期

关键词：

normalization; feature transformation; feature extraction; noise robustness; histogram normalization; feature space rotation;

D O I：

10.1016/S0167-6393(03)00085-2

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work normalization techniques in the acoustic feature space are studied that aim at reducing the mismatch between training and test by matching their distributions. Histogram normalization is the first technique explored in detail. The effect of normalization at different signal analysis stages as well as training and test data normalization are investigated. The basic normalization approach is improved by taking care of the variable silence fraction. Feature space rotation is the second technique that is introduced. It accounts for undesired variations in the acoustic signal that are correlated in the feature space dimensions. The interaction of rotation and histogram normalization is analyzed and it is shown that the recognition accuracy is significantly improved by both techniques on corpora with different complexity, acoustic conditions, and speaking styles. The word error rate is reduced from 24.6% to 21.8% on VerbMobil II, a German large vocabulary conversational speech task, and from 16.5% to 15.5% on EuTrans II, an Italian speech corpus of conversational speech over telephone. On the CarNavigation task, a German isolated-word corpus recorded partly in noisy car environments, the word error rate is reduced from 74.2% to 11.1% for heavy mismatch conditions between training and test. (C) 2003 Elsevier B.V. All rights reserved.

引用

下载

页码：579 / 601

页数：23

共 50 条

[21] Validation of Speech Data for Training Automatic Speech Recognition Systems
Krizaj, Janes
Gros, Jerneja Zganec
Dobrisek, Simon
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1165 - 1169
[22] Robust Recognition of Conversational Telephone Speech via Multi-condition Training and Data Augmentation
Malek, Jiri
Zdansky, Jindrich
Cerva, Petr
TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 324 - 333
[23] Limited training data robust speech recognition using kernel-based acoustic models
Schaffoener, Martin
Krueger, Sven E.
Andelic, Edin
Katz, Marcel
Wendemuth, Andreas
2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1137 - 1140
[24] A mismatch-aware stochastic matching algorithm for robust speech recognition
Liao, YF
Lin, JS
Chen, JH
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 101 - 104
[25] Maximum-likelihood approach to stochastic matching for robust speech recognition
Sankar, A
Lee, CH
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (03): : 190 - 202
[26] Modeling of the PSTN Channel and multireferences training in robust speech recognition
Preiss, R.
2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 2417 - 2420
[27] JOINT NOISE ADAPTIVE TRAINING FOR ROBUST AUTOMATIC SPEECH RECOGNITION
Narayanan, Arun
Wang, DeLiang
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[28] A Global Discriminant Joint Training Framework for Robust Speech Recognition
Li, Lujun
Kuerzinger, Ludwig
Watzel, Tobias
Rigoll, Gerhard
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 544 - 551
[29] Modeling of the PSTN channel and multireferences training in robust speech recognition
Preiss, R.
Gabrea, M.
2006 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, VOLS 1-7, 2006, : 589 - +
[30] Robust Submodular Data Partitioning for Distributed Speech Recognition
Qi, Jun
Tejedor, Javier
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2254 - 2258

← 1 2 3 4 5 →