Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition

被引:0
|
作者
Sivasankaran, Sunit [1 ]
Vincent, Emmanuel [1 ]
Fohr, Dominique [1 ]
机构
[1] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France
关键词
Multichannel speech separation; WSJ0-2mix reverberated;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information, speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from the speaker location which is then used to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second order statistics and to derive an adaptive beamformer in the third stage. We generated a multichannel, multispeaker, reverberated, noisy dataset inspired from the well studied WSJ0-2mix and study the performance of the proposed pipeline in terms of the word error rate (WER). An average WER of 29.4% was achieved using the ground truth localization information and 42.4% using the localization information estimated via GCC-PHAT. Though higher signal-to-interference ratio (SIR) between the speakers was found to positively impact the speech separation performance, equivalent performances were obtained for mixtures with lower SIR values when the speakers are well separated in space.
引用
收藏
页码:346 / 350
页数:5
相关论文
共 50 条
  • [1] Impact of Emotional Speech to Automatic Speaker Recognition - Experiments on GEES Speech Database
    Jokic, Ivan
    Jokic, Stevan
    Delic, Vlado
    Peric, Zoran
    [J]. SPEECH AND COMPUTER, 2014, 8773 : 268 - 275
  • [2] ADAPTING TO THE SPEAKER IN AUTOMATIC SPEECH RECOGNITION
    TALBOT, M
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1987, 27 (04): : 449 - 457
  • [3] Automatic speaker recognition with crosslanguage speech material
    Kuenzel, Hermann J.
    [J]. INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2013, 20 (01) : 21 - 44
  • [4] SIMILARITY MEASURE FOR AUTOMATIC SPEECH AND SPEAKER RECOGNITION
    SCHROEDER, MR
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1968, 43 (02): : 375 - +
  • [5] Methodologies for the evaluation of Speaker Diarization and Automatic Speech Recognition in the presence of overlapping speech
    Galibert, Olivier
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1130 - 1133
  • [6] AUTOMATIC SPEAKER AUTHENTICATION USING SPEECH RECOGNITION TECHNIQUES
    MEEKER, WF
    MARTIN, TB
    HERSCHER, MB
    PHYFE, D
    WEINSTOCK, M
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 42 (05): : 1182 - &
  • [7] Research on automatic speaker recognition based on speech clustering
    Xu, Limin
    Qian, Bo
    Cheng, Weiming
    Tang, Zhenmin
    [J]. ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 2, PROCEEDINGS, 2006, : 105 - +
  • [8] Correlation Networks for Speaker Normalization in Automatic Speech Recognition
    Sharon, Rini A.
    Kothinti, Sandeep Reddy
    Umesh, Srinivasan
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 882 - 886
  • [9] Forensic Automatic Speaker Recognition with Degraded and Enhanced Speech
    Kuenzel, Hermann
    Alexander, Paul
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2014, 62 (04): : 244 - 253
  • [10] Speaker-Invariant Features for Automatic Speech Recognition
    Umesh, S.
    Sanand, D. R.
    Praveen, G.
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1738 - 1743