A unified network for multi-speaker speech recognition with multi-channel recordings

被引：0

作者：

Liu, Conggui ^{[1
]}

Inoue, Nakamasa ^{[1
]}

Shinoda, Koichi ^{[1
]}

机构：

[1] Tokyo Inst Technol, Tokyo, Japan

来源：

2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017) | 2017年

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Despite the recent progress in speech recognition, meeting speech recognition is still a challenging task, since it is often difficult to separate one speaker's voice from the others in meetings. In this paper, we propose a joint training framework of speaker separation and speech recognition with multi-channel recordings for this purpose. The location of each speaker is first estimated and then used to recover her/his original speech in a delay-and-subtraction (DAS) algorithm. The two components, speaker separation and speech recognition, are represented by one deep net, which is optimized as a whole using training data. We evaluated our method using simulated data generated from WSJCAMO database. Compared with the independent training of the two components, our proposed method improved word accuracy by 15.2% when the locations of speakers are known, and by 53.6% when the locations of speakers are unknown.

引用

页码：1304 / 1307

页数：4

共 50 条

[1] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
Chang, Xuankai
Zhang, Wangyou
Qian, Yanmin
Le Roux, Jonathan
Watanabe, Shinji
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244
[2] A Multi-channel/Multi-speaker Articulatory Database in Mandarin for Speech Visualization
Zhang, Dan
Liu, Xianqian
Yan, Nan
Wang, Lan
Zhu, Yun
Chen, Hui
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 299 - +
[3] SPEAKER ADAPTED BEAMFORMING FOR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION
Menne, Tobias
Schlueter, Ralf
Ney, Hermann
[J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 535 - 541
[4] The segmentation of multi-channel meeting recordings for automatic speech recognition
Dines, John
Vepa, Jithendra
Hain, Thomas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1213 - +
[5] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
Settle, Shane
Le Roux, Jonathan
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
[6] Speech Recognition and Multi-Speaker Diarization of Long Conversations
Mao, Huanru Henry
Li, Shuyang
McAuley, Julian
Cottrell, Garrison W.
[J]. INTERSPEECH 2020, 2020, : 691 - 695
[7] A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin
Yu, Jun
Su, Rongfeng
Wang, Lan
Zhou, Wenpeng
[J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[8] End-to-End Multilingual Multi-Speaker Speech Recognition
Seki, Hiroshi
Hori, Takaaki
Watanabe, Shinji
Le Roux, Jonathan
Hershey, John R.
[J]. INTERSPEECH 2019, 2019, : 3755 - 3759
[9] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER
Chang, Xuankai
Zhang, Wangyou
Qian, Yanmin
Le Roux, Jonathan
Watanabe, Shinji
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6134 - 6138
[10] SuperFormer: Enhanced Multi-Speaker Speech Separation Network Combining Channel and Spatial Adaptability
Jiang, Yanji
Qiu, Youli
Shen, Xueli
Sun, Chuan
Liu, Haitao
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (15):

← 1 2 3 4 5 →