Blind Speech Separation and Dereverberation using neural beamforming

被引:3
|
作者
Pfeifenberger, Lukas [1 ]
Pernkopf, Franz [1 ]
机构
[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Intelligent Syst Grp, Inffeldgasse 16c, Graz, Austria
基金
奥地利科学基金会;
关键词
Multi-channel speaker separation; Beamforming; Dereverberation; Speaker identification; Triplet mining; MASK ESTIMATION; NETWORK; EMBEDDINGS;
D O I
10.1016/j.specom.2022.03.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triplet mining. We introduce a frequency-domain model which uses complex-valued neural networks, and a time-domain variant which performs beamforming in latent space. Further, we propose a block-online mode to process longer audio recordings, as they occur in meeting scenarios. We evaluate our system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word Error Rate (WER) and Equal Error Rate (EER).
引用
收藏
页码:29 / 41
页数:13
相关论文
共 50 条
  • [1] JOINT BLIND DEREVERBERATION AND SEPARATION OF SPEECH MIXTURES
    Jan, Tariqullah
    Wang, Wenwu
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2343 - 2347
  • [2] Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization
    Yoshioka, Takuya
    Nakatani, Tomohiro
    Miyoshi, Masato
    Okuno, Hiroshi G.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01): : 69 - 84
  • [3] Speech Recognition Using Blind Source Separation and Dereverberation Method for Mixed Sound of Speech and Music
    Wang, Longbiao
    Odani, Kyohei
    Kai, Atsuhiko
    Li, Weifeng
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [4] SEMI-BLIND SPEECH ENHANCEMENT BASED ON RECURRENT NEURAL NETWORK FOR SOURCE SEPARATION AND DEREVERBERATION
    Wake, Masaya
    Bando, Yoshiaki
    Mimura, Masato
    Itoyama, Katsutoshi
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    [J]. 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [5] A Semi-blind Source Separation Approach for Speech Dereverberation
    Wang, Ziteng
    Na, Yueyue
    Liu, Zhang
    Li, Yun
    Tian, Biao
    Fu, Qiang
    [J]. INTERSPEECH 2020, 2020, : 3925 - 3929
  • [6] Blind dereverberation of a speech signal
    Zverev, V. A.
    [J]. ACOUSTICAL PHYSICS, 2008, 54 (02) : 261 - 268
  • [7] Blind dereverberation of a speech signal
    V. A. Zverev
    [J]. Acoustical Physics, 2008, 54 : 261 - 268
  • [8] Microphone array beamforming approach to blind speech separation
    Himawan, Ivan
    McCowan, Iain
    Lincoln, Mike
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2008, 4892 : 295 - +
  • [9] Online blind source separation and dereverberation of speech based on a joint diagonalizability constraint
    Yu, Ho-Gun
    Kim, Do-Hui
    Song, Min-Hwan
    Park, Hyung-Min
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 503 - 514
  • [10] A Novel Approach for Blind Separation and Dereverberation of Speech Mixtures using Multiple step Linear Predictive Coding
    Ehsan, Wajeeha
    Jan, Tariqullah
    [J]. 2015 INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET), 2015,