On the selection of the impulse responses for distant-speech recognition based on contaminated speech training

被引：0

作者：

Ravanelli, Mirco ^{[1
]}

Omologo, Maurizio ^{[1
]}

机构：

[1] Fdn Bruno Kessler, Trento, Italy

来源：

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年

关键词：

robust speech recognition; multi-condition training; reverberation;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distant-speech recognition represents a technology of fundamental importance for future development of assistive applications characterized by flexible and unobtrusive interaction in home environments. State-of-the-art speech recognition still exhibits lack of robustness, and an unacceptable performance variability, due to environmental noise, reverberation effects, and speaker position. In the past, multi-condition training and contamination methods were explored to reduce the mismatch between training and test conditions. However, the performance evaluation can be biased by factors as limited number of positions of speaker and microphones, adopted set of impulse responses, vocabulary and grammars defining the recognition task. The purpose of this paper is to investigate in more detail some critical aspects that characterize such experimental context. To this purpose, our work addressed a microphone network distributed over different rooms of an apartment and a related set of speaker-microphone pairs leading to a very large set of impulse responses. Besides simulations, the experiments also tackled real speech interactions. The performance evaluation was based on a phone-loop task, in order to minimize the influence of linguistic constraints. The experimental results show how less critical is an accurate selection of impulse responses, if compared to other factors as the signal-to-noise ratio introduced by additive background noise.

引用

页码：1028 / 1032

页数：5

共 50 条

[21] Training of Automatic Speech Recognition System on Noised Speech
Prodeus, Arkadiy
Kukharicheva, Kateryna
2016 4TH INTERNATIONAL CONFERENCE ON METHODS AND SYSTEMS OF NAVIGATION AND MOTION CONTROL (MSNMC), 2016, : 221 - 223
[22] Automatic Speech Recognition Performance for Training on Noised Speech
Prodeus, Arkadiy
Kukharicheva, Kateryna
2017 2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION AND COMMUNICATION TECHNOLOGIES-2017 (AICT 2017), 2017, : 71 - 74
[23] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
APPLIED ACOUSTICS, 2023, 211
[24] A DIGITAL MICROPHONE ARRAY FOR DISTANT SPEECH RECOGNITION
Zwyssig, Erich
Lincoln, Mike
Renals, Steve
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5106 - 5109
[25] Robust distant-talking speech recognition
Lin, Q
Che, C
Yuk, DS
Jin, L
deVries, B
Pearson, J
Flanagan, J
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 21 - 24
[26] Convolutional Neural Networks for Distant Speech Recognition
Swietojanski, Pawel
Ghoshal, Arnab
Renals, Steve
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1120 - 1124
[27] Learning to Rank Microphones for Distant Speech Recognition
Cornell, Samuele
Brutti, Alessio
Matassoni, Marco
Squartini, Stefano
INTERSPEECH 2021, 2021, : 3855 - 3859
[28] Microphone Array Processing for Distant Speech Recognition
Kumatani, Kenichi
McDonough, John
Raj, Bhiksha
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 127 - 140
[29] Strategies for distant speech recognition in reverberant environments
Delcroix, Marc
Yoshioka, Takuya
Ogawa, Atsunori
Kubo, Yotaro
Fujimoto, Masakiyo
Ito, Nobutaka
Kinoshita, Keisuke
Espi, Miquel
Araki, Shoko
Hori, Takaaki
Nakatani, Tomohiro
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
[30] Data selection for speech recognition
Wu, Yi
Zhang, Rong
Rudnicky, Alexander
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 562 - 565

← 1 2 3 4 5 →