Blind Speech Separation and Enhancement With GCC-NMF

被引:40
|
作者
Wood, Sean U. N. [1 ]
Rouat, Jean [1 ]
Dupont, Stephane [2 ]
Pironkov, Gueorgui [2 ]
机构
[1] Univ Sherbrooke, Dept Elect & Comp Engn, NECOTIS, Sherbrooke, PQ J1K 2R1, Canada
[2] Univ Mons, Dept Theory Circuits & Signal Proc, B-7000 Mons, Belgium
基金
加拿大自然科学与工程研究理事会;
关键词
Blind speech separation; CASA; cocktail party problem; GCC; interaural time difference; NMF; PHAT; NONNEGATIVE MATRIX FACTORIZATION; AUDIO SOURCE SEPARATION; INFORMATION; MODELS;
D O I
10.1109/TASLP.2017.2656805
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a blind source separation algorithm named GCC-NMF that combines unsupervised dictionary learning via non-negative matrix factorization (NMF) with spatial localization via the generalized cross correlation (GCC) method. Dictionary learning is performed on the mixture signal, with separation subsequently achieved by grouping dictionary atoms, at each point in time, according to their spatial origins. The resulting source separation algorithm is simple yet flexible, requiring no prior knowledge or information. Separation quality is evaluated for three tasks using stereo recordings from the publicly available SiSEC signal separation evaluation campaign: 3 and 4 concurrent speakers in reverberant environments, speech mixed with real-world background noise, and noisy recordings of a moving speaker. Performance is quantified using perceptually motivated and SNR-based measures with the PEASS and BSS Eval toolkits, respectively. We evaluate the effects of model parameters on separation quality, and compare our approach with other unsupervised and semi-supervised speech separation and enhancement approaches. We show that GCC-NMF is a flexible source separation algorithm, outperforming task-specific approaches in each of the three settings, including both blind as well as several informed approaches that require prior knowledge or information.
引用
收藏
页码:745 / 755
页数:11
相关论文
共 50 条
  • [21] Speech Enhancement via Combination of Wiener Filter and Blind Source Separation
    Hu, Hongmei
    Taghia, Jalil
    Sang, Jinqiu
    Taghia, Jalal
    Mohammadiha, Nasser
    Azarpour, Masoumeh
    Dokku, Raiyalakshmi
    Wang, Shouyan
    Lutman, Mark E.
    Bleeck, Stefan
    PRACTICAL APPLICATIONS OF INTELLIGENT SYSTEMS, 2011, 124 : 485 - +
  • [22] A Review on Speech Separation using NMF and Its Extensions
    Pham, Tuan
    Lee, Yuan-Shan
    Chen, Yu-An
    Wang, Jia-Ching
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2015, : 26 - 29
  • [23] SPEECH ENHANCEMENT COMBINING STATISTICAL MODELS AND NMF WITH UPDATE OF SPEECH AND NOISE BASES
    Kwon, Kisoo
    Shin, Jong Won
    Sonowal, Sukanya
    Choi, Inkyu
    Kim, Nam Soo
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [24] Speech Enhancement Combining NMF Weighted by Speech Presence Probability and Statistical Model
    Hu, Yonggang
    Zhang, Xiongwei
    Zou, Xia
    Min, Gang
    Sun, Meng
    Zheng, Yunfei
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2015, E98A (12) : 2701 - 2704
  • [25] Speech enhancement using posterior regularized NMF with bases update
    Sunnydayal, V.
    Kumar, T. Kishore
    COMPUTERS & ELECTRICAL ENGINEERING, 2017, 62 : 663 - 675
  • [26] A Blind Source Separation Based Approach for Speech Enhancement in Noisy and Reverberant Environment
    Pignotti, Alessio
    Marcozzi, Daniele
    Cifani, Simone
    Squartini, Stefano
    Piazza, Francesco
    CROSS-MODAL ANALYSIS OF SPEECH, GESTURES, GAZE AND FACIAL EXPRESSIONS, 2009, 5641 : 356 - 367
  • [27] NMF-Based Speech Enhancement Using Bases Update
    Kwon, Kisoo
    Shin, Jong Won
    Kim, Nam Soo
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (04) : 450 - 454
  • [28] Research on Speech Enhancement Algorithms Based on Blind Source Separation in Outdoor Environment
    Wang, Chunli
    Wang, Quanyu
    CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 837 - 842
  • [29] SPEECH ENHANCEMENT USING β- DIVERGENCE BASED NMF WITH UPDATE BASES
    Sunnydayal, V.
    Kumar, T. Kishore
    2016 INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING AND COMMUNICATIONS (MICROCOM), 2016,
  • [30] Phoneme-dependent NMF for speech enhancement in monaural mixtures
    Raj, Bhiksha
    Singh, Rita
    Virtanen, Tuomas
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1224 - +