Autoencoder based multi-stream combination for noise robust speech recognition

被引：0

作者：

Mallidi, Sri Harish ^{[1
]}

Ogawa, Tetsuji ^{[3
]}

Vesely, Karel ^{[4
]}

Nidadavolu, Phani S. ^{[1
]}

Hermansky, Hynek ^{[1
,2
]}

机构：

[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD USA

[3] Waseda Univ, Dept Comp Sci & Engn, Tokyo, Japan

[4] Brno Univ Technol, Speech FIT Grp, Brno, Czech Republic

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

speech recognition; human-computer interaction; computational paralinguistics;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Performances of automatic speech recognition (ASR) systems degrade rapidly when there is a mismatch between train and test acoustic conditions. Performance can be improved using a multi-stream framework, which involves combining posterior probabilities from several classifiers (often deep neural networks (DNNs)) trained on different features/streams. Knowledge about the confidence of each of these classifiers on a noisy test utterance can help in devising better techniques for posterior combination than simple sum and product rules [1]. In this work, we propose to use autoencoders which are multi layer feed forward neural networks, for estimating this confidence measure. During the training phase, for each stream, an autocoder is trained on TANDEM features extracted from the corresponding DNN. On employing the autoencoder during the testing phase, we show that the reconstruction error of the autoencoder is correlated to the robustness of the corresponding stream. These error estimates are then used as confidence measures to combine the posterior probabilities generated from each of the streams. Experiments on Aurora4 and BABEL databases indicate significant improvements, especially in the scenario of mismatch between train and test acoustic conditions.

引用

页码：3551 / 3555

页数：5

共 50 条

[1] Noise Adaptive Stream Fusion Based on Feature Component Rejection for Robust Multi-Stream Speech Recognition
Zhang, Jun
Feng, Yizhi
Ning, Gengxin
Ji, Fei
[J]. 2015 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2015, : 279 - 283
[2] Multi-stream speech recognition based on Dempster-Shafer combination rule
Valente, Fabio
[J]. SPEECH COMMUNICATION, 2010, 52 (03) : 213 - 222
[3] Multi-stream adaptive evidence combination for noise robust ASR
Morris, A
Hagen, A
Glotin, H
Bourlard, H
[J]. SPEECH COMMUNICATION, 2001, 34 (1-2) : 25 - 40
[4] Robust multi-stream speech recognition based on weighting the output probabilities of feature components
Zhang, Jun
Wei, Gang
Yu, Hua
[J]. Shengxue Xuebao/Acta Acustica, 2008, 33 (02): : 102 - 108
[5] Phase AutoCorrelation (PAC) features in entropy based multi-stream for robust speech recognition
Ikbal, S
Misra, H
Bourlard, H
Hermansky, H
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 205 - 208
[6] Robust multi-stream speech recognition based on weighting the output probabilities of feature components
ZHANG Jun WEI Gang YU Hua NING Genxin (College of Electronic & Information Engineering
[J]. Chinese Journal of Acoustics, 2009, 28 (03) : 269 - 279
[7] Robust Speaker Recognition Based on Multi-Stream Features
Wang, Ning
Wang, Lei
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-CHINA (ICCE-CHINA), 2016,
[8] Multi-Stream Spectro-Temporal Features for Robust Speech Recognition
Zhao, Sherry Y.
Morgan, Nelson
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 898 - 901
[9] COMBINATION STRATEGY BASED ON RELATIVE PERFORMANCE MONITORING FOR MULTI-STREAM REVERBERANT SPEECH RECOGNITION
Xiong, Feifei
Goetze, Stefan
Meyer, Bernd T.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4870 - 4874
[10] Stream fusion for multi-stream automatic speech recognition
Sagha, Hesam
Li, Feipeng
Variani, Ehsan
Millan, Jose del R.
Chavarriaga, Ricardo
Schuller, Bjoern
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 669 - 675

← 1 2 3 4 5 →