Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition

被引：0

作者：

Sivasankaran, Sunit ^{[1
]}

Vincent, Emmanuel ^{[1
]}

Fohr, Dominique ^{[1
]}

机构：

[1] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France

来源：

28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020) | 2021年

关键词：

Multichannel speech separation; WSJ0-2mix reverberated;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information, speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from the speaker location which is then used to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second order statistics and to derive an adaptive beamformer in the third stage. We generated a multichannel, multispeaker, reverberated, noisy dataset inspired from the well studied WSJ0-2mix and study the performance of the proposed pipeline in terms of the word error rate (WER). An average WER of 29.4% was achieved using the ground truth localization information and 42.4% using the localization information estimated via GCC-PHAT. Though higher signal-to-interference ratio (SIR) between the speakers was found to positively impact the speech separation performance, equivalent performances were obtained for mixtures with lower SIR values when the speakers are well separated in space.

引用

页码：346 / 350

页数：5

共 50 条

[1] Impact of Emotional Speech to Automatic Speaker Recognition - Experiments on GEES Speech Database
Jokic, Ivan
Jokic, Stevan
Delic, Vlado
Peric, Zoran
[J]. SPEECH AND COMPUTER, 2014, 8773 : 268 - 275
[2] ADAPTING TO THE SPEAKER IN AUTOMATIC SPEECH RECOGNITION
TALBOT, M
[J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1987, 27 (04): : 449 - 457
[3] Automatic speaker recognition with crosslanguage speech material
Kuenzel, Hermann J.
[J]. INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2013, 20 (01) : 21 - 44
[4] SIMILARITY MEASURE FOR AUTOMATIC SPEECH AND SPEAKER RECOGNITION
SCHROEDER, MR
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1968, 43 (02): : 375 - +
[5] Methodologies for the evaluation of Speaker Diarization and Automatic Speech Recognition in the presence of overlapping speech
Galibert, Olivier
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1130 - 1133
[6] AUTOMATIC SPEAKER AUTHENTICATION USING SPEECH RECOGNITION TECHNIQUES
MEEKER, WF
MARTIN, TB
HERSCHER, MB
PHYFE, D
WEINSTOCK, M
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 42 (05): : 1182 - &
[7] Research on automatic speaker recognition based on speech clustering
Xu, Limin
Qian, Bo
Cheng, Weiming
Tang, Zhenmin
[J]. ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 2, PROCEEDINGS, 2006, : 105 - +
[8] Correlation Networks for Speaker Normalization in Automatic Speech Recognition
Sharon, Rini A.
Kothinti, Sandeep Reddy
Umesh, Srinivasan
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 882 - 886
[9] Forensic Automatic Speaker Recognition with Degraded and Enhanced Speech
Kuenzel, Hermann
Alexander, Paul
[J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2014, 62 (04): : 244 - 253
[10] Speaker-Invariant Features for Automatic Speech Recognition
Umesh, S.
Sanand, D. R.
Praveen, G.
[J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1738 - 1743

← 1 2 3 4 5 →