Impact of phase estimation on single-channel speech separation based on time-frequency masking

被引:20
|
作者
Mayer, Florian [1 ]
Williamson, Donald S. [2 ]
Mowlaee, Pejman [3 ]
Wang, DeLiang [4 ]
机构
[1] FH Joanneum Univ Appl Sci, Graz, Austria
[2] Indiana Univ, Dept Comp Sci, Bloomington, IN 47405 USA
[3] Graz Univ Technol, Signal Proc & Speech Commun Lab, Graz, Austria
[4] Ohio State Univ, Dept Comp Sci & Engn, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
来源
基金
奥地利科学基金会;
关键词
ALGORITHM; NOISE; RECONSTRUCTION;
D O I
10.1121/1.4986647
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-frequency masking is a common solution for the single-channel source separation (SCSS) problem where the goal is to find a time-frequency mask that separates the underlying sources from an observed mixture. An estimated mask is then applied to the mixed signal to extract the desired signal. During signal reconstruction, the time-frequency-masked spectral amplitude is combined with the mixture phase. This article considers the impact of replacing the mixture spectral phase with an estimated clean spectral phase combined with the estimated magnitude spectrum using a conventional model-based approach. As the proposed phase estimator requires estimated fundamental frequency of the underlying signal from the mixture, a robust pitch estimator is proposed. The upper-bound clean phase results show the potential of phase-aware processing in single-channel source separation. Also, the experiments demonstrate that replacing the mixture phase with the estimated clean spectral phase consistently improves perceptual speech quality, predicted speech intelligibility, and source separation performance across all signal-to-noise ratio and noise scenarios. (C) 2017 Acoustical Society of America.
引用
收藏
页码:4668 / 4679
页数:12
相关论文
共 50 条
  • [1] TIME-FREQUENCY CONSTRAINTS FOR PHASE ESTIMATION IN SINGLE-CHANNEL SPEECH ENHANCEMENT
    Mowlaee, Pejman
    Saeidi, Rahim
    [J]. 2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2014, : 337 - 341
  • [2] PHASE RECONSTRUCTION WITH LEARNED TIME-FREQUENCY REPRESENTATIONS FOR SINGLE-CHANNEL SPEECH SEPARATION
    Wichern, Gordon
    Le Roux, Jonathan
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 396 - 400
  • [3] Single-channel speech separation based on modulation frequency
    Gu, Lingyun
    Stern, Richard M.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 25 - 28
  • [4] Time-Frequency Representations for Single-Channel Music Source Separation
    Tan, Vanessa H.
    de Leon, Franz
    [J]. 2019 INTERNATIONAL SYMPOSIUM ON MULTIMEDIA AND COMMUNICATION TECHNOLOGY (ISMAC), 2019,
  • [5] Phase estimation for signal reconstruction in single-channel speech separation
    Mowlaee, Pejman
    Saeidi, Rahim
    Martin, Rainer
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1546 - 1549
  • [6] TIME-FREQUENCY MASKING STRATEGIES FOR SINGLE-CHANNEL LOW-LATENCY SPEECH ENHANCEMENT USING NEURAL NETWORKS
    Parviainen, Mikko
    Pertila, Pasi
    Virtanen, Tuomas
    Grosche, Peter
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 51 - 55
  • [7] Robust speech separation using time-frequency masking
    Aarabi, P
    Shi, GJ
    Jahromi, O
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 741 - 744
  • [8] Single-Channel Speech Enhancement Based on Psychoacoustic Masking
    Zhou, Tingting
    Zeng, Yumin
    Wang, Rongrong
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2017, 65 (04): : 272 - 284
  • [9] Blind separation of speech mixtures via time-frequency masking
    Yilmaz, Ö
    Rickard, S
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (07) : 1830 - 1847
  • [10] A Phase-Based Time-Frequency masking for multi-channel speech enhancement in domestic environments
    Brutti, Alessio
    Tsiami, Antigoni
    Katsamanis, Athanasios
    Maragos, Petros
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2875 - 2879