Blind Speech Separation and Dereverberation using neural beamforming

被引：3

作者：

Pfeifenberger, Lukas ^{[1
]}

Pernkopf, Franz ^{[1
]}

机构：

[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Intelligent Syst Grp, Inffeldgasse 16c, Graz, Austria

来源：

SPEECH COMMUNICATION | 2022年 / 140卷

基金：

奥地利科学基金会;

关键词：

Multi-channel speaker separation; Beamforming; Dereverberation; Speaker identification; Triplet mining; MASK ESTIMATION; NETWORK; EMBEDDINGS;

D O I：

10.1016/j.specom.2022.03.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triplet mining. We introduce a frequency-domain model which uses complex-valued neural networks, and a time-domain variant which performs beamforming in latent space. Further, we propose a block-online mode to process longer audio recordings, as they occur in meeting scenarios. We evaluate our system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word Error Rate (WER) and Equal Error Rate (EER).

引用

页码：29 / 41

页数：13

共 50 条

[1] JOINT BLIND DEREVERBERATION AND SEPARATION OF SPEECH MIXTURES
Jan, Tariqullah
Wang, Wenwu
[J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2343 - 2347
[2] Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization
Yoshioka, Takuya
Nakatani, Tomohiro
Miyoshi, Masato
Okuno, Hiroshi G.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01): : 69 - 84
[3] Speech Recognition Using Blind Source Separation and Dereverberation Method for Mixed Sound of Speech and Music
Wang, Longbiao
Odani, Kyohei
Kai, Atsuhiko
Li, Weifeng
[J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
[4] SEMI-BLIND SPEECH ENHANCEMENT BASED ON RECURRENT NEURAL NETWORK FOR SOURCE SEPARATION AND DEREVERBERATION
Wake, Masaya
Bando, Yoshiaki
Mimura, Masato
Itoyama, Katsutoshi
Yoshii, Kazuyoshi
Kawahara, Tatsuya
[J]. 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
[5] A Semi-blind Source Separation Approach for Speech Dereverberation
Wang, Ziteng
Na, Yueyue
Liu, Zhang
Li, Yun
Tian, Biao
Fu, Qiang
[J]. INTERSPEECH 2020, 2020, : 3925 - 3929
[6] Blind dereverberation of a speech signal
Zverev, V. A.
[J]. ACOUSTICAL PHYSICS, 2008, 54 (02) : 261 - 268
[7] Blind dereverberation of a speech signal
V. A. Zverev
[J]. Acoustical Physics, 2008, 54 : 261 - 268
[8] Microphone array beamforming approach to blind speech separation
Himawan, Ivan
McCowan, Iain
Lincoln, Mike
[J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2008, 4892 : 295 - +
[9] Online blind source separation and dereverberation of speech based on a joint diagonalizability constraint
Yu, Ho-Gun
Kim, Do-Hui
Song, Min-Hwan
Park, Hyung-Min
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 503 - 514
[10] A Novel Approach for Blind Separation and Dereverberation of Speech Mixtures using Multiple step Linear Predictive Coding
Ehsan, Wajeeha
Jan, Tariqullah
[J]. 2015 INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET), 2015,

← 1 2 3 4 5 →