Blind Speech Separation and Enhancement With GCC-NMF

被引:40
|
作者
Wood, Sean U. N. [1 ]
Rouat, Jean [1 ]
Dupont, Stephane [2 ]
Pironkov, Gueorgui [2 ]
机构
[1] Univ Sherbrooke, Dept Elect & Comp Engn, NECOTIS, Sherbrooke, PQ J1K 2R1, Canada
[2] Univ Mons, Dept Theory Circuits & Signal Proc, B-7000 Mons, Belgium
基金
加拿大自然科学与工程研究理事会;
关键词
Blind speech separation; CASA; cocktail party problem; GCC; interaural time difference; NMF; PHAT; NONNEGATIVE MATRIX FACTORIZATION; AUDIO SOURCE SEPARATION; INFORMATION; MODELS;
D O I
10.1109/TASLP.2017.2656805
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a blind source separation algorithm named GCC-NMF that combines unsupervised dictionary learning via non-negative matrix factorization (NMF) with spatial localization via the generalized cross correlation (GCC) method. Dictionary learning is performed on the mixture signal, with separation subsequently achieved by grouping dictionary atoms, at each point in time, according to their spatial origins. The resulting source separation algorithm is simple yet flexible, requiring no prior knowledge or information. Separation quality is evaluated for three tasks using stereo recordings from the publicly available SiSEC signal separation evaluation campaign: 3 and 4 concurrent speakers in reverberant environments, speech mixed with real-world background noise, and noisy recordings of a moving speaker. Performance is quantified using perceptually motivated and SNR-based measures with the PEASS and BSS Eval toolkits, respectively. We evaluate the effects of model parameters on separation quality, and compare our approach with other unsupervised and semi-supervised speech separation and enhancement approaches. We show that GCC-NMF is a flexible source separation algorithm, outperforming task-specific approaches in each of the three settings, including both blind as well as several informed approaches that require prior knowledge or information.
引用
收藏
页码:745 / 755
页数:11
相关论文
共 50 条
  • [31] DEEP RECURRENT NMF FOR SPEECH SEPARATION BY UNFOLDING ITERATIVE THRESHOLDING
    Wisdom, Scott
    Powers, Thomas
    Pitton, James
    Atlas, Les
    2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 254 - 258
  • [32] Discriminative Training of NMF Model Based on Class Probabilities for Speech Enhancement
    Chung, Hanwook
    Plourde, Eric
    Champagne, Benoit
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (04) : 502 - 506
  • [33] A new regularized forward blind source separation algorithm for automatic speech quality enhancement
    Zoulikha, Meriem
    Djendi, Mohamed
    APPLIED ACOUSTICS, 2016, 112 : 192 - 200
  • [34] A hybrid speech enhancement system employing blind source separation and adaptive noise cancellation
    Low, SY
    Nordholm, S
    NORSIG 2004: PROCEEDINGS OF THE 6TH NORDIC SIGNAL PROCESSING SYMPOSIUM, 2004, 46 : 204 - 207
  • [35] A Variable Step Size-Forward Blind Source Separation Algorithm for Speech Enhancement
    Zoulikha, Meriem
    Djendi, Mohamed
    Guessoum, Abderezzak
    ADVANCED CONTROL ENGINEERING METHODS IN ELECTRICAL ENGINEERING SYSTEMS, 2019, 522 : 479 - 487
  • [36] Speech Enhancement Based on NMF Under Electric Vehicle Noise Condition
    Wang, Minghe
    Zhang, Erhua
    Tang, Zhenmin
    IEEE ACCESS, 2018, 6 : 9147 - 9159
  • [37] Training and compensation of class-conditioned NMF bases for speech enhancement
    Chung, Hanwook
    Badeau, Roland
    Plourde, Eric
    Champagne, Benoit
    NEUROCOMPUTING, 2018, 284 : 107 - 118
  • [38] Performance analysis of neural network, NMF and statistical approaches for speech enhancement
    Kandagatla, Ravi Kumar
    Potluri, Venkata Subbaiah
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) : 917 - 937
  • [39] NMF-Based Speech Enhancement Using Multitaper Spectrum Estimation
    Attabi, Yazid
    Chung, Hanwook
    Champagne, Benoit
    Zhu, Wei-Ping
    2018 INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2018, : 36 - 41
  • [40] DNN-Based Speech Enhancement via Integrating NMF and CASA
    Yan, Bofang
    Bao, Changchun
    Bai, Zhigang
    2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 435 - 439