Blind Speech Separation and Enhancement With GCC-NMF

被引：40

作者：

Wood, Sean U. N. ^{[1
]}

Rouat, Jean ^{[1
]}

Dupont, Stephane ^{[2
]}

Pironkov, Gueorgui ^{[2
]}

机构：

[1] Univ Sherbrooke, Dept Elect & Comp Engn, NECOTIS, Sherbrooke, PQ J1K 2R1, Canada

[2] Univ Mons, Dept Theory Circuits & Signal Proc, B-7000 Mons, Belgium

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2017年 / 25卷 / 04期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Blind speech separation; CASA; cocktail party problem; GCC; interaural time difference; NMF; PHAT; NONNEGATIVE MATRIX FACTORIZATION; AUDIO SOURCE SEPARATION; INFORMATION; MODELS;

D O I：

10.1109/TASLP.2017.2656805

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a blind source separation algorithm named GCC-NMF that combines unsupervised dictionary learning via non-negative matrix factorization (NMF) with spatial localization via the generalized cross correlation (GCC) method. Dictionary learning is performed on the mixture signal, with separation subsequently achieved by grouping dictionary atoms, at each point in time, according to their spatial origins. The resulting source separation algorithm is simple yet flexible, requiring no prior knowledge or information. Separation quality is evaluated for three tasks using stereo recordings from the publicly available SiSEC signal separation evaluation campaign: 3 and 4 concurrent speakers in reverberant environments, speech mixed with real-world background noise, and noisy recordings of a moving speaker. Performance is quantified using perceptually motivated and SNR-based measures with the PEASS and BSS Eval toolkits, respectively. We evaluate the effects of model parameters on separation quality, and compare our approach with other unsupervised and semi-supervised speech separation and enhancement approaches. We show that GCC-NMF is a flexible source separation algorithm, outperforming task-specific approaches in each of the three settings, including both blind as well as several informed approaches that require prior knowledge or information.

引用

页码：745 / 755

页数：11

共 50 条

[1] Blind Speech Separation with GCC-NMF
Wood, Sean U. N.
Rouat, Jean
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3329 - 3333
[2] Real-time Speech Enhancement with GCC-NMF
Wood, Sean U. N.
Rouat, Jean
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2665 - 2669
[3] Real-time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA Jetson
Wood, Sean U. N.
Rouat, Jean
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2048 - 2049
[4] 基于GCC-NMF的语音分离研究
吴君钦
王迎福
江西理工大学学报, 2020, 41 (05) : 65 - 72
[5] Unsupervised Low Latency Speech Enhancement With RT-GCC-NMF
Wood, Sean U. N.
Rouat, Jean
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) : 332 - 346
[6] Single Channel Blind Source Separation Based on NMF and Its Application to Speech Enhancement
Chen, Yongqiang
2017 IEEE 9TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN), 2017, : 1066 - 1069
[7] DEEP NMF FOR SPEECH SEPARATION
Le Roux, Jonathan
Hershey, John R.
Weninger, Felix
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 66 - 70
[8] Beamspace blind signal separation for speech enhancement
Low, Siow Yong
Yiu, Ka-Fai Cedric
Nordholm, Sven
OPTIMIZATION AND ENGINEERING, 2009, 10 (02) : 313 - 330
[9] Beamspace blind signal separation for speech enhancement
Siow Yong Low
Ka-Fai Cedric Yiu
Sven Nordholm
Optimization and Engineering, 2009, 10 : 313 - 330
[10] Online Parametric NMF for Speech Enhancement
Kavalekalam, Mathew Shaji
Nielsen, Jesper Kjaer
Shi, Liming
Christensen, Mads Graesboll
Boldt, Jesper
2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2320 - 2324

← 1 2 3 4 5 →