On the selection of the impulse responses for distant-speech recognition based on contaminated speech training

被引：0

作者：

Ravanelli, Mirco ^{[1
]}

Omologo, Maurizio ^{[1
]}

机构：

[1] Fdn Bruno Kessler, Trento, Italy

来源：

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年

关键词：

robust speech recognition; multi-condition training; reverberation;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distant-speech recognition represents a technology of fundamental importance for future development of assistive applications characterized by flexible and unobtrusive interaction in home environments. State-of-the-art speech recognition still exhibits lack of robustness, and an unacceptable performance variability, due to environmental noise, reverberation effects, and speaker position. In the past, multi-condition training and contamination methods were explored to reduce the mismatch between training and test conditions. However, the performance evaluation can be biased by factors as limited number of positions of speaker and microphones, adopted set of impulse responses, vocabulary and grammars defining the recognition task. The purpose of this paper is to investigate in more detail some critical aspects that characterize such experimental context. To this purpose, our work addressed a microphone network distributed over different rooms of an apartment and a related set of speaker-microphone pairs leading to a very large set of impulse responses. Besides simulations, the experiments also tackled real speech interactions. The performance evaluation was based on a phone-loop task, in order to minimize the influence of linguistic constraints. The experimental results show how less critical is an accurate selection of impulse responses, if compared to other factors as the signal-to-noise ratio introduced by additive background noise.

引用

页码：1028 / 1032

页数：5

共 50 条

[1] Contaminated speech training methods for robust DNN-HMM distant speech recognition
Ravanelli, Mirco
Omologo, Maurizio
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 756 - 760
[2] Hidden Markov model training with contaminated speech material for distant-talking speech recognition
Matassoni, M
Omologo, M
Giuliani, D
Svaizer, P
COMPUTER SPEECH AND LANGUAGE, 2002, 16 (02): : 205 - 223
[3] On the use of empirically determined impulse responses for improving distant talking speech recognition
Ploetz, Thomas
Fink, Gernot A.
2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 157 - 160
[4] Cepstral distance based channel selection for distant speech recognition
Flores, Cristina Guerrero
Tryfou, Georgina
Omologo, Maurizio
COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 314 - 332
[5] THE DIRHA-ENGLISH CORPUS AND RELATED TASKS FOR DISTANT-SPEECH RECOGNITION IN DOMESTIC ENVIRONMENTS
Ravanelli, Mirco
Cristoforetti, Luca
Gretter, Roberto
Pellin, Marco
Sosi, Alessandro
Omologo, Maurizio
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 275 - 282
[6] Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance
Guerrero, Cristina
Tryfou, Georgina
Omologo, Maurizio
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1986 - 1990
[7] BATCH-NORMALIZED JOINT TRAINING FOR DNN-BASED DISTANT SPEECH RECOGNITION
Ravanelli, Mirco
Brakel, Philemon
Omologo, Maurizio
Bengio, Yoshua
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 28 - 34
[8] A MULTI-CHANNEL CORPUS FOR DISTANT-SPEECH INTERACTION IN PRESENCE OF KNOWN INTERFERENCES
Zwyssig, Erich
Ravanelli, Mirco
Svaizer, Piergiorgio
Omologo, Maurizio
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4480 - 4484
[9] Distant speech recognition:: Bridging the gaps
McDonough, John
Woelfel, Matthias
2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 109 - +
[10] NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
Renals, Steve
Swietojanski, Pawel
2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 172 - 176

← 1 2 3 4 5 →