Blind Speech Separation and Enhancement With GCC-NMF

被引:40
|
作者
Wood, Sean U. N. [1 ]
Rouat, Jean [1 ]
Dupont, Stephane [2 ]
Pironkov, Gueorgui [2 ]
机构
[1] Univ Sherbrooke, Dept Elect & Comp Engn, NECOTIS, Sherbrooke, PQ J1K 2R1, Canada
[2] Univ Mons, Dept Theory Circuits & Signal Proc, B-7000 Mons, Belgium
基金
加拿大自然科学与工程研究理事会;
关键词
Blind speech separation; CASA; cocktail party problem; GCC; interaural time difference; NMF; PHAT; NONNEGATIVE MATRIX FACTORIZATION; AUDIO SOURCE SEPARATION; INFORMATION; MODELS;
D O I
10.1109/TASLP.2017.2656805
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a blind source separation algorithm named GCC-NMF that combines unsupervised dictionary learning via non-negative matrix factorization (NMF) with spatial localization via the generalized cross correlation (GCC) method. Dictionary learning is performed on the mixture signal, with separation subsequently achieved by grouping dictionary atoms, at each point in time, according to their spatial origins. The resulting source separation algorithm is simple yet flexible, requiring no prior knowledge or information. Separation quality is evaluated for three tasks using stereo recordings from the publicly available SiSEC signal separation evaluation campaign: 3 and 4 concurrent speakers in reverberant environments, speech mixed with real-world background noise, and noisy recordings of a moving speaker. Performance is quantified using perceptually motivated and SNR-based measures with the PEASS and BSS Eval toolkits, respectively. We evaluate the effects of model parameters on separation quality, and compare our approach with other unsupervised and semi-supervised speech separation and enhancement approaches. We show that GCC-NMF is a flexible source separation algorithm, outperforming task-specific approaches in each of the three settings, including both blind as well as several informed approaches that require prior knowledge or information.
引用
收藏
页码:745 / 755
页数:11
相关论文
共 50 条
  • [1] Blind Speech Separation with GCC-NMF
    Wood, Sean U. N.
    Rouat, Jean
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3329 - 3333
  • [2] Real-time Speech Enhancement with GCC-NMF
    Wood, Sean U. N.
    Rouat, Jean
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2665 - 2669
  • [3] Real-time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA Jetson
    Wood, Sean U. N.
    Rouat, Jean
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2048 - 2049
  • [4] 基于GCC-NMF的语音分离研究
    吴君钦
    王迎福
    江西理工大学学报, 2020, 41 (05) : 65 - 72
  • [5] Unsupervised Low Latency Speech Enhancement With RT-GCC-NMF
    Wood, Sean U. N.
    Rouat, Jean
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) : 332 - 346
  • [6] Single Channel Blind Source Separation Based on NMF and Its Application to Speech Enhancement
    Chen, Yongqiang
    2017 IEEE 9TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN), 2017, : 1066 - 1069
  • [7] DEEP NMF FOR SPEECH SEPARATION
    Le Roux, Jonathan
    Hershey, John R.
    Weninger, Felix
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 66 - 70
  • [8] Beamspace blind signal separation for speech enhancement
    Low, Siow Yong
    Yiu, Ka-Fai Cedric
    Nordholm, Sven
    OPTIMIZATION AND ENGINEERING, 2009, 10 (02) : 313 - 330
  • [9] Beamspace blind signal separation for speech enhancement
    Siow Yong Low
    Ka-Fai Cedric Yiu
    Sven Nordholm
    Optimization and Engineering, 2009, 10 : 313 - 330
  • [10] Online Parametric NMF for Speech Enhancement
    Kavalekalam, Mathew Shaji
    Nielsen, Jesper Kjaer
    Shi, Liming
    Christensen, Mads Graesboll
    Boldt, Jesper
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2320 - 2324