Blind Speech Separation and Dereverberation using neural beamforming

被引:3
|
作者
Pfeifenberger, Lukas [1 ]
Pernkopf, Franz [1 ]
机构
[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Intelligent Syst Grp, Inffeldgasse 16c, Graz, Austria
基金
奥地利科学基金会;
关键词
Multi-channel speaker separation; Beamforming; Dereverberation; Speaker identification; Triplet mining; MASK ESTIMATION; NETWORK; EMBEDDINGS;
D O I
10.1016/j.specom.2022.03.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triplet mining. We introduce a frequency-domain model which uses complex-valued neural networks, and a time-domain variant which performs beamforming in latent space. Further, we propose a block-online mode to process longer audio recordings, as they occur in meeting scenarios. We evaluate our system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word Error Rate (WER) and Equal Error Rate (EER).
引用
收藏
页码:29 / 41
页数:13
相关论文
共 50 条
  • [41] Utterance-based Speech Dereverberation using Blind Channel Estimation and Multichannel Equalization
    Haque, Mohammad Ariful
    [J]. 2014 INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (ICECE), 2014, : 274 - 277
  • [42] Blind speech dereverberation using sparse decomposition and multi-channel linear prediction
    Mousavi, Leila
    Razzazi, Farbod
    Haghbin, Afrooz
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 729 - 738
  • [43] Evaluation of separation and dereverberation performance in frequency domain blind source separation
    Mukai, Ryo
    Araki, Shoko
    Sawada, Hiroshi
    Makino, Shoji
    [J]. Acoust. Sci. Technol., 1600, 2 (119-126):
  • [44] Cascaded Speech Separation Denoising and Dereverberation Using Attention and TCN-WPE Networks for Speech Devices
    Zhang, Xuan
    Tang, Jun
    Cao, Huiliang
    Wang, Chenguang
    Shen, Chong
    Liu, Jun
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (10): : 18047 - 18058
  • [45] Research on blind source separation and blind beamforming
    Zhao, B
    Yang, JA
    Zhang, M
    [J]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 4389 - 4393
  • [46] Blind speech separation using a joint model of speech production
    Smith, D
    Lukasiak, J
    Burnett, I
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (11) : 784 - 787
  • [47] CNN-QTLBO: an optimal blind source separation and blind dereverberation scheme using lightweight CNN-QTLBO and PCDP-LDA for speech mixtures
    Sheeja, Jasmine J. C.
    Sankaragomathi, B.
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (05) : 1323 - 1331
  • [48] Blind Separation of Speech Using Cochlear Filtering
    Mischie, Septimiu
    Ionel, Raul
    [J]. 2010 INTERNATIONAL CONFERENCE ON APPLIED ELECTRONICS, 2010, : 221 - 224
  • [49] Blind speech dereverberation using sparse decomposition and multi-channel linear prediction
    Leila Mousavi
    Farbod Razzazi
    Afrooz Haghbin
    [J]. International Journal of Speech Technology, 2019, 22 : 729 - 738
  • [50] CNN-QTLBO: an optimal blind source separation and blind dereverberation scheme using lightweight CNN-QTLBO and PCDP-LDA for speech mixtures
    Jasmine J. C. Sheeja
    B. Sankaragomathi
    [J]. Signal, Image and Video Processing, 2022, 16 : 1323 - 1331