Impact of phase estimation on single-channel speech separation based on time-frequency masking

被引：20

作者：

Mayer, Florian ^{[1
]}

Williamson, Donald S. ^{[2
]}

Mowlaee, Pejman ^{[3
]}

Wang, DeLiang ^{[4
]}

机构：

[1] FH Joanneum Univ Appl Sci, Graz, Austria

[2] Indiana Univ, Dept Comp Sci, Bloomington, IN 47405 USA

[3] Graz Univ Technol, Signal Proc & Speech Commun Lab, Graz, Austria

[4] Ohio State Univ, Dept Comp Sci & Engn, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2017年 / 141卷 / 06期

基金：

奥地利科学基金会;

关键词：

ALGORITHM; NOISE; RECONSTRUCTION;

D O I：

10.1121/1.4986647

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Time-frequency masking is a common solution for the single-channel source separation (SCSS) problem where the goal is to find a time-frequency mask that separates the underlying sources from an observed mixture. An estimated mask is then applied to the mixed signal to extract the desired signal. During signal reconstruction, the time-frequency-masked spectral amplitude is combined with the mixture phase. This article considers the impact of replacing the mixture spectral phase with an estimated clean spectral phase combined with the estimated magnitude spectrum using a conventional model-based approach. As the proposed phase estimator requires estimated fundamental frequency of the underlying signal from the mixture, a robust pitch estimator is proposed. The upper-bound clean phase results show the potential of phase-aware processing in single-channel source separation. Also, the experiments demonstrate that replacing the mixture phase with the estimated clean spectral phase consistently improves perceptual speech quality, predicted speech intelligibility, and source separation performance across all signal-to-noise ratio and noise scenarios. (C) 2017 Acoustical Society of America.

引用

页码：4668 / 4679

页数：12

共 50 条

[1] TIME-FREQUENCY CONSTRAINTS FOR PHASE ESTIMATION IN SINGLE-CHANNEL SPEECH ENHANCEMENT
Mowlaee, Pejman
Saeidi, Rahim
[J]. 2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2014, : 337 - 341
[2] PHASE RECONSTRUCTION WITH LEARNED TIME-FREQUENCY REPRESENTATIONS FOR SINGLE-CHANNEL SPEECH SEPARATION
Wichern, Gordon
Le Roux, Jonathan
[J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 396 - 400
[3] Single-channel speech separation based on modulation frequency
Gu, Lingyun
Stern, Richard M.
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 25 - 28
[4] Time-Frequency Representations for Single-Channel Music Source Separation
Tan, Vanessa H.
de Leon, Franz
[J]. 2019 INTERNATIONAL SYMPOSIUM ON MULTIMEDIA AND COMMUNICATION TECHNOLOGY (ISMAC), 2019,
[5] Phase estimation for signal reconstruction in single-channel speech separation
Mowlaee, Pejman
Saeidi, Rahim
Martin, Rainer
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1546 - 1549
[6] TIME-FREQUENCY MASKING STRATEGIES FOR SINGLE-CHANNEL LOW-LATENCY SPEECH ENHANCEMENT USING NEURAL NETWORKS
Parviainen, Mikko
Pertila, Pasi
Virtanen, Tuomas
Grosche, Peter
[J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 51 - 55
[7] Robust speech separation using time-frequency masking
Aarabi, P
Shi, GJ
Jahromi, O
[J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 741 - 744
[8] Single-Channel Speech Enhancement Based on Psychoacoustic Masking
Zhou, Tingting
Zeng, Yumin
Wang, Rongrong
[J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2017, 65 (04): : 272 - 284
[9] Blind separation of speech mixtures via time-frequency masking
Yilmaz, Ö
Rickard, S
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (07) : 1830 - 1847
[10] A Phase-Based Time-Frequency masking for multi-channel speech enhancement in domestic environments
Brutti, Alessio
Tsiami, Antigoni
Katsamanis, Athanasios
Maragos, Petros
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2875 - 2879

← 1 2 3 4 5 →