Blind Speech Separation and Enhancement With GCC-NMF

被引：40

作者：

Wood, Sean U. N. ^{[1
]}

Rouat, Jean ^{[1
]}

Dupont, Stephane ^{[2
]}

Pironkov, Gueorgui ^{[2
]}

机构：

[1] Univ Sherbrooke, Dept Elect & Comp Engn, NECOTIS, Sherbrooke, PQ J1K 2R1, Canada

[2] Univ Mons, Dept Theory Circuits & Signal Proc, B-7000 Mons, Belgium

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2017年 / 25卷 / 04期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Blind speech separation; CASA; cocktail party problem; GCC; interaural time difference; NMF; PHAT; NONNEGATIVE MATRIX FACTORIZATION; AUDIO SOURCE SEPARATION; INFORMATION; MODELS;

D O I：

10.1109/TASLP.2017.2656805

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a blind source separation algorithm named GCC-NMF that combines unsupervised dictionary learning via non-negative matrix factorization (NMF) with spatial localization via the generalized cross correlation (GCC) method. Dictionary learning is performed on the mixture signal, with separation subsequently achieved by grouping dictionary atoms, at each point in time, according to their spatial origins. The resulting source separation algorithm is simple yet flexible, requiring no prior knowledge or information. Separation quality is evaluated for three tasks using stereo recordings from the publicly available SiSEC signal separation evaluation campaign: 3 and 4 concurrent speakers in reverberant environments, speech mixed with real-world background noise, and noisy recordings of a moving speaker. Performance is quantified using perceptually motivated and SNR-based measures with the PEASS and BSS Eval toolkits, respectively. We evaluate the effects of model parameters on separation quality, and compare our approach with other unsupervised and semi-supervised speech separation and enhancement approaches. We show that GCC-NMF is a flexible source separation algorithm, outperforming task-specific approaches in each of the three settings, including both blind as well as several informed approaches that require prior knowledge or information.

引用

页码：745 / 755

页数：11

共 50 条

[21] Speech Enhancement via Combination of Wiener Filter and Blind Source Separation
Hu, Hongmei
Taghia, Jalil
Sang, Jinqiu
Taghia, Jalal
Mohammadiha, Nasser
Azarpour, Masoumeh
Dokku, Raiyalakshmi
Wang, Shouyan
Lutman, Mark E.
Bleeck, Stefan
PRACTICAL APPLICATIONS OF INTELLIGENT SYSTEMS, 2011, 124 : 485 - +
[22] A Review on Speech Separation using NMF and Its Extensions
Pham, Tuan
Lee, Yuan-Shan
Chen, Yu-An
Wang, Jia-Ching
PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2015, : 26 - 29
[23] SPEECH ENHANCEMENT COMBINING STATISTICAL MODELS AND NMF WITH UPDATE OF SPEECH AND NOISE BASES
Kwon, Kisoo
Shin, Jong Won
Sonowal, Sukanya
Choi, Inkyu
Kim, Nam Soo
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[24] Speech Enhancement Combining NMF Weighted by Speech Presence Probability and Statistical Model
Hu, Yonggang
Zhang, Xiongwei
Zou, Xia
Min, Gang
Sun, Meng
Zheng, Yunfei
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2015, E98A (12) : 2701 - 2704
[25] Speech enhancement using posterior regularized NMF with bases update
Sunnydayal, V.
Kumar, T. Kishore
COMPUTERS & ELECTRICAL ENGINEERING, 2017, 62 : 663 - 675
[26] A Blind Source Separation Based Approach for Speech Enhancement in Noisy and Reverberant Environment
Pignotti, Alessio
Marcozzi, Daniele
Cifani, Simone
Squartini, Stefano
Piazza, Francesco
CROSS-MODAL ANALYSIS OF SPEECH, GESTURES, GAZE AND FACIAL EXPRESSIONS, 2009, 5641 : 356 - 367
[27] NMF-Based Speech Enhancement Using Bases Update
Kwon, Kisoo
Shin, Jong Won
Kim, Nam Soo
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (04) : 450 - 454
[28] Research on Speech Enhancement Algorithms Based on Blind Source Separation in Outdoor Environment
Wang, Chunli
Wang, Quanyu
CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 837 - 842
[29] SPEECH ENHANCEMENT USING β- DIVERGENCE BASED NMF WITH UPDATE BASES
Sunnydayal, V.
Kumar, T. Kishore
2016 INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING AND COMMUNICATIONS (MICROCOM), 2016,
[30] Phoneme-dependent NMF for speech enhancement in monaural mixtures
Raj, Bhiksha
Singh, Rita
Virtanen, Tuomas
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1224 - +

← 1 2 3 4 5 →