AUGMENTED TIME-FREQUENCY MASK ESTIMATION IN CLUSTER-BASED SOURCE SEPARATION ALGORITHMS

被引:0
|
作者
Luo, Yi [1 ]
Mesgarani, Nima [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Source separation; clustering; mask estimation; deep learning;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-frequency mask estimation with various clustering approaches has proven effective in solving the audio source separation problem. In this framework, the time-frequency bins of the mixture spectrogram are represented in a high-dimensional embedding space, where various methods can be applied to group the embedded points to calculate either hard or soft source assignments and subsequently the time-frequency masks. However, the mismatch between the assignment algorithm during the training and inference phases in majority of the current approaches leads to a suboptimal solution, because the assignment objective that is used during the training (e.g. ideal binary mask) is not the same as the one used during the inference phase (e.g. k-means clustering). We propose a method to reduce the mismatch between these two conditions where the source embedding is trained such that the source assignment during training and inference phases results in similar outcomes. Our results show that matching the source assignment during training-and inference-phase results in more accurate and consistent mask estimation in the inference phase which significantly improves the source separation accuracy for various hard and soft clustering methods.
引用
收藏
页码:710 / 714
页数:5
相关论文
共 50 条
  • [1] Binaural Speech Separation Based on the Time-Frequency Binary Mask
    Mahmoodzadeh, A.
    Abutalebi, H. R.
    Soltanian-Zadeh, H.
    Sheikhzadeh, H.
    [J]. 2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 848 - 853
  • [2] LINEAR MULTICHANNEL BLIND SOURCE SEPARATION BASED ON TIME-FREQUENCY MASK OBTAINED BY HARMONIC/PERCUSSIVE SOUND SEPARATION
    Oyabu, Soichiro
    Kitamura, Daichi
    Yatabe, Kohei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 201 - 205
  • [3] Hardware design for blind source separation using fast time-frequency mask technique
    Tsai, Tsung-Han
    Liu, Pei-Yun
    Chiou, Yu-He
    [J]. INTEGRATION-THE VLSI JOURNAL, 2022, 82 : 67 - 77
  • [4] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Nasir Saleem
    Muhammad Irfan Khattak
    Gunawan Witjaksono
    Gulzar Ahmad
    [J]. Multimedia Tools and Applications, 2019, 78 : 31867 - 31891
  • [5] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Saleem, Nasir
    Khattak, Muhammad Irfan
    Witjaksono, Gunawan
    Ahmad, Gulzar
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 31867 - 31891
  • [6] Constructing Time-Frequency Dictionaries for Source Separation via Time-Frequency Masking and Source Localisation
    de Frein, Ruairi
    Rickard, Scott T.
    Pearlmutter, Barak A.
    [J]. INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS, 2009, 5441 : 573 - +
  • [7] A bayesian approach to time-frequency based blind source separation
    Févotte, C
    Godsill, SJ
    [J]. 2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, : 1 - 4
  • [8] Blind source separation based on time-frequency signal representations
    Belouchrani, A
    Amin, MG
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1998, 46 (11) : 2888 - 2897
  • [9] ONLINE BLIND SOURCE SEPARATION BASED ON TIME-FREQUENCY SPARSENESS
    Loesch, Benedikt
    Yang, Bin
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 117 - 120
  • [10] A CONVEX OPTIMIZATION APPROACH FOR TIME-FREQUENCY MASK ESTIMATION
    Bao, Feng
    Abdulla, Waleed H.
    [J]. 2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 31 - 35