Blind Speech Separation and Dereverberation using neural beamforming

被引：3

作者：

Pfeifenberger, Lukas ^{[1
]}

Pernkopf, Franz ^{[1
]}

机构：

[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Intelligent Syst Grp, Inffeldgasse 16c, Graz, Austria

来源：

SPEECH COMMUNICATION | 2022年 / 140卷

基金：

奥地利科学基金会;

关键词：

Multi-channel speaker separation; Beamforming; Dereverberation; Speaker identification; Triplet mining; MASK ESTIMATION; NETWORK; EMBEDDINGS;

D O I：

10.1016/j.specom.2022.03.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triplet mining. We introduce a frequency-domain model which uses complex-valued neural networks, and a time-domain variant which performs beamforming in latent space. Further, we propose a block-online mode to process longer audio recordings, as they occur in meeting scenarios. We evaluate our system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word Error Rate (WER) and Equal Error Rate (EER).

引用

页码：29 / 41

页数：13

共 50 条

[41] Utterance-based Speech Dereverberation using Blind Channel Estimation and Multichannel Equalization
Haque, Mohammad Ariful
[J]. 2014 INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (ICECE), 2014, : 274 - 277
[42] Blind speech dereverberation using sparse decomposition and multi-channel linear prediction
Mousavi, Leila
Razzazi, Farbod
Haghbin, Afrooz
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 729 - 738
[43] Evaluation of separation and dereverberation performance in frequency domain blind source separation
Mukai, Ryo
Araki, Shoko
Sawada, Hiroshi
Makino, Shoji
[J]. Acoust. Sci. Technol., 1600, 2 (119-126):
[44] Cascaded Speech Separation Denoising and Dereverberation Using Attention and TCN-WPE Networks for Speech Devices
Zhang, Xuan
Tang, Jun
Cao, Huiliang
Wang, Chenguang
Shen, Chong
Liu, Jun
[J]. IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (10): : 18047 - 18058
[45] Research on blind source separation and blind beamforming
Zhao, B
Yang, JA
Zhang, M
[J]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 4389 - 4393
[46] Blind speech separation using a joint model of speech production
Smith, D
Lukasiak, J
Burnett, I
[J]. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (11) : 784 - 787
[47] CNN-QTLBO: an optimal blind source separation and blind dereverberation scheme using lightweight CNN-QTLBO and PCDP-LDA for speech mixtures
Sheeja, Jasmine J. C.
Sankaragomathi, B.
[J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (05) : 1323 - 1331
[48] Blind Separation of Speech Using Cochlear Filtering
Mischie, Septimiu
Ionel, Raul
[J]. 2010 INTERNATIONAL CONFERENCE ON APPLIED ELECTRONICS, 2010, : 221 - 224
[49] Blind speech dereverberation using sparse decomposition and multi-channel linear prediction
Leila Mousavi
Farbod Razzazi
Afrooz Haghbin
[J]. International Journal of Speech Technology, 2019, 22 : 729 - 738
[50] CNN-QTLBO: an optimal blind source separation and blind dereverberation scheme using lightweight CNN-QTLBO and PCDP-LDA for speech mixtures
Jasmine J. C. Sheeja
B. Sankaragomathi
[J]. Signal, Image and Video Processing, 2022, 16 : 1323 - 1331

← 1 2 3 4 5 →