A unified network for multi-speaker speech recognition with multi-channel recordings

被引:0
|
作者
Liu, Conggui [1 ]
Inoue, Nakamasa [1 ]
Shinoda, Koichi [1 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Despite the recent progress in speech recognition, meeting speech recognition is still a challenging task, since it is often difficult to separate one speaker's voice from the others in meetings. In this paper, we propose a joint training framework of speaker separation and speech recognition with multi-channel recordings for this purpose. The location of each speaker is first estimated and then used to recover her/his original speech in a delay-and-subtraction (DAS) algorithm. The two components, speaker separation and speech recognition, are represented by one deep net, which is optimized as a whole using training data. We evaluated our method using simulated data generated from WSJCAMO database. Compared with the independent training of the two components, our proposed method improved word accuracy by 15.2% when the locations of speakers are known, and by 53.6% when the locations of speakers are unknown.
引用
收藏
页码:1304 / 1307
页数:4
相关论文
共 50 条
  • [1] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244
  • [2] A Multi-channel/Multi-speaker Articulatory Database in Mandarin for Speech Visualization
    Zhang, Dan
    Liu, Xianqian
    Yan, Nan
    Wang, Lan
    Zhu, Yun
    Chen, Hui
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 299 - +
  • [3] SPEAKER ADAPTED BEAMFORMING FOR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION
    Menne, Tobias
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 535 - 541
  • [4] The segmentation of multi-channel meeting recordings for automatic speech recognition
    Dines, John
    Vepa, Jithendra
    Hain, Thomas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1213 - +
  • [5] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [6] Speech Recognition and Multi-Speaker Diarization of Long Conversations
    Mao, Huanru Henry
    Li, Shuyang
    McAuley, Julian
    Cottrell, Garrison W.
    [J]. INTERSPEECH 2020, 2020, : 691 - 695
  • [7] A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin
    Yu, Jun
    Su, Rongfeng
    Wang, Lan
    Zhou, Wenpeng
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [8] End-to-End Multilingual Multi-Speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    [J]. INTERSPEECH 2019, 2019, : 3755 - 3759
  • [9] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6134 - 6138
  • [10] SuperFormer: Enhanced Multi-Speaker Speech Separation Network Combining Channel and Spatial Adaptability
    Jiang, Yanji
    Qiu, Youli
    Shen, Xueli
    Sun, Chuan
    Liu, Haitao
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (15):