On the selection of the impulse responses for distant-speech recognition based on contaminated speech training

被引:0
|
作者
Ravanelli, Mirco [1 ]
Omologo, Maurizio [1 ]
机构
[1] Fdn Bruno Kessler, Trento, Italy
关键词
robust speech recognition; multi-condition training; reverberation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distant-speech recognition represents a technology of fundamental importance for future development of assistive applications characterized by flexible and unobtrusive interaction in home environments. State-of-the-art speech recognition still exhibits lack of robustness, and an unacceptable performance variability, due to environmental noise, reverberation effects, and speaker position. In the past, multi-condition training and contamination methods were explored to reduce the mismatch between training and test conditions. However, the performance evaluation can be biased by factors as limited number of positions of speaker and microphones, adopted set of impulse responses, vocabulary and grammars defining the recognition task. The purpose of this paper is to investigate in more detail some critical aspects that characterize such experimental context. To this purpose, our work addressed a microphone network distributed over different rooms of an apartment and a related set of speaker-microphone pairs leading to a very large set of impulse responses. Besides simulations, the experiments also tackled real speech interactions. The performance evaluation was based on a phone-loop task, in order to minimize the influence of linguistic constraints. The experimental results show how less critical is an accurate selection of impulse responses, if compared to other factors as the signal-to-noise ratio introduced by additive background noise.
引用
收藏
页码:1028 / 1032
页数:5
相关论文
共 50 条
  • [1] Contaminated speech training methods for robust DNN-HMM distant speech recognition
    Ravanelli, Mirco
    Omologo, Maurizio
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 756 - 760
  • [2] Hidden Markov model training with contaminated speech material for distant-talking speech recognition
    Matassoni, M
    Omologo, M
    Giuliani, D
    Svaizer, P
    COMPUTER SPEECH AND LANGUAGE, 2002, 16 (02): : 205 - 223
  • [3] On the use of empirically determined impulse responses for improving distant talking speech recognition
    Ploetz, Thomas
    Fink, Gernot A.
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 157 - 160
  • [4] Cepstral distance based channel selection for distant speech recognition
    Flores, Cristina Guerrero
    Tryfou, Georgina
    Omologo, Maurizio
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 314 - 332
  • [5] THE DIRHA-ENGLISH CORPUS AND RELATED TASKS FOR DISTANT-SPEECH RECOGNITION IN DOMESTIC ENVIRONMENTS
    Ravanelli, Mirco
    Cristoforetti, Luca
    Gretter, Roberto
    Pellin, Marco
    Sosi, Alessandro
    Omologo, Maurizio
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 275 - 282
  • [6] Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance
    Guerrero, Cristina
    Tryfou, Georgina
    Omologo, Maurizio
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1986 - 1990
  • [7] BATCH-NORMALIZED JOINT TRAINING FOR DNN-BASED DISTANT SPEECH RECOGNITION
    Ravanelli, Mirco
    Brakel, Philemon
    Omologo, Maurizio
    Bengio, Yoshua
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 28 - 34
  • [8] A MULTI-CHANNEL CORPUS FOR DISTANT-SPEECH INTERACTION IN PRESENCE OF KNOWN INTERFERENCES
    Zwyssig, Erich
    Ravanelli, Mirco
    Svaizer, Piergiorgio
    Omologo, Maurizio
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4480 - 4484
  • [9] Distant speech recognition:: Bridging the gaps
    McDonough, John
    Woelfel, Matthias
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 109 - +
  • [10] NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
    Renals, Steve
    Swietojanski, Pawel
    2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 172 - 176