Blind Speech Separation and Dereverberation using neural beamforming

被引:3
|
作者
Pfeifenberger, Lukas [1 ]
Pernkopf, Franz [1 ]
机构
[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Intelligent Syst Grp, Inffeldgasse 16c, Graz, Austria
基金
奥地利科学基金会;
关键词
Multi-channel speaker separation; Beamforming; Dereverberation; Speaker identification; Triplet mining; MASK ESTIMATION; NETWORK; EMBEDDINGS;
D O I
10.1016/j.specom.2022.03.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triplet mining. We introduce a frequency-domain model which uses complex-valued neural networks, and a time-domain variant which performs beamforming in latent space. Further, we propose a block-online mode to process longer audio recordings, as they occur in meeting scenarios. We evaluate our system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word Error Rate (WER) and Equal Error Rate (EER).
引用
收藏
页码:29 / 41
页数:13
相关论文
共 50 条
  • [31] Speech dereverberation based on blind estimation of a reverberation filter
    Zee, Min-Seon
    Park, Hyung-Min
    [J]. IEICE ELECTRONICS EXPRESS, 2009, 6 (20): : 1456 - 1461
  • [32] Switching Divergences for Spectral Learning in Blind Speech Dereverberation
    Javier Ibarrola, Francisco
    Daniel Spies, Ruben
    Ezequiel Di Persia, Eandro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (05) : 881 - 891
  • [33] TARGET SPEECH EXTRACTION IN COCKTAIL PARTY BY COMBINING BEAMFORMING AND BLIND SOURCE SEPARATION
    Wang, Lin
    Ding, Heping
    Yin, Fuliang
    [J]. ACOUSTICS AUSTRALIA, 2011, 39 (02) : 64 - 68
  • [34] SPEECH DEREVERBERATION USING A LEARNED SPEECH MODEL
    Liang, Dawen
    Hoffman, Matthew D.
    Mysore, Gautham J.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 1871 - 1875
  • [35] SEQUENTIAL MULTI-FRAME NEURAL BEAMFORMING FOR SPEECH SEPARATION AND ENHANCEMENT
    Wang, Zhong-Qiu
    Erdogan, Hakan
    Wisdom, Scott
    Wilson, Kevin
    Raj, Desh
    Watanabe, Shinji
    Chen, Zhuo
    Hershey, John R.
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 905 - 911
  • [36] BLIND AND NEURAL NETWORK-GUIDED CONVOLUTIONAL BEAMFORMER FOR JOINT DENOISING, DEREVERBERATION, AND SOURCE SEPARATION
    Nakatani, Tomohiro
    Ikeshita, Rintaro
    Kinoshita, Keisuke
    Sawada, Hiroshi
    Araki, Shoko
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6129 - 6133
  • [37] End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party
    Zhang, Wangyou
    Chang, Xuankai
    Boeddeker, Christoph
    Nakatani, Tomohiro
    Watanabe, Shinji
    Qian, Yanmin
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 (3173-3188) : 3173 - 3188
  • [38] A BILINEAR FRAMEWORK FOR ADAPTIVE SPEECH DEREVERBERATION COMBINING BEAMFORMING AND LINEAR PREDICTION
    Yang, Wenxing
    Huang, Gongping
    Brendel, Andreas
    Chen, Jingdong
    Benesty, Jacob
    Kellermann, Walter
    Cohen, Israel
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [39] Over-determined Speech Source Separation and Dereverberation
    Togami, Masahito
    Scheibler, Robin
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 705 - 710
  • [40] JOINT TRAINING OF DEEP NEURAL NETWORKS FOR MULTI-CHANNEL DEREVERBERATION AND SPEECH SOURCE SEPARATION
    Togami, Masahito
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3032 - 3036