Phase Processing for Single-Channel Speech Enhancement

被引:176
|
作者
Gerkmann, Timo [1 ,2 ,3 ]
Krawczyk-Becker, Martin [1 ]
Le Roux, Jonathan [4 ,5 ]
机构
[1] Siemens Corp Res, Princeton, NJ USA
[2] Royal Inst Technol, Stockholm, Sweden
[3] Carl von Ossietzky Univ Oldenburg, D-26111 Oldenburg, Germany
[4] Mitsubishi Elect Res Labs, Cambridge, MA USA
[5] Nippon Telegraph & Tel Commun Sci Labs, Kyoto, Japan
关键词
SPECTRAL MAGNITUDE ESTIMATION; TIME FOURIER-TRANSFORM; SIGNAL ESTIMATION; VOCODER; AUDIO;
D O I
10.1109/MSP.2014.2369251
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the advancement of technology, both assisted listening devices and speech communication devices are becoming more portable and also more frequently used. As a consequence, users of devices such as hearing aids, cochlear implants, and mobile telephones, expect their devices to work robustly anywhere and at any time. This holds in particular for challenging noisy environments like a cafeteria, a restaurant, a subway, a factory, or in traffic. One way to making assisted listening devices robust to noise is to apply speech enhancement algorithms. To improve the corrupted speech, spatial diversity can be exploited by a constructive combination of microphone signals (so-called beamforming), and by exploiting the different spectro-temporal properties of speech and noise. Here, we focus on single-channel speech enhancement algorithms which rely on spectrotemporal properties. On the one hand, these algorithms can be employed when the miniaturization of devices only allows for using a single microphone. On the other hand, when multiple microphones are available, single-channel algorithms can be employed as a postprocessor at the output of a beamformer. To exploit the short-term stationary properties of natural sounds, many of these approaches process the signal in a time-frequency representation, most frequently the short-time discrete Fourier transform (STFT) domain. In this domain, the coefficients of the signal are complex-valued, and can therefore be represented by their absolute value (referred to in the literature both as STFT magnitude and STFT amplitude) and their phase. While the modeling and processing of the STFT magnitude has been the center of interest in the past three decades, phase has been largely ignored. In this article, we review the role of phase processing for speech enhancement in the context of assisted listening and speech communication devices. We explain why most of the research conducted in this field used to focus on estimating spectral magnitudes in the STFT domain, and why recently phase processing is attracting increasing interest in the speech enhancement community. Furthermore, we review both early and recent methods for phase processing in speech enhancement. We aim to show that phase processing is an exciting field of research with the potential to make assisted listening and speech communication devices more robust in acoustically challenging environments.
引用
收藏
页码:55 / 66
页数:12
相关论文
共 50 条
  • [1] Phase-Aware Single-channel Speech Enhancement
    Mowlaee, Pejman
    Watanabe, Mario Kaoru
    Saeidi, Rahim
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1871 - 1873
  • [2] Phase Based Single-Channel Speech Enhancement Using Phase Ratio
    Singh, Sachin
    Mutawa, A. M.
    Gupta, Monika
    Tripathy, Manoj
    Anand, R. S.
    [J]. 2017 6TH INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS IN ELECTRICAL ENGINEERING - RECENT ADVANCES (CERA), 2017, : 393 - 396
  • [3] ON PHASE IMPORTANCE IN PARAMETER ESTIMATION IN SINGLE-CHANNEL SPEECH ENHANCEMENT
    Mowlaee, Pejman
    Saeidi, Rahim
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7462 - 7466
  • [4] On Speech Intelligibility Estimation of Phase-Aware Single-Channel Speech Enhancement
    Gaich, Andreas
    Mowlaee, Pejman
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2553 - 2557
  • [5] STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement
    Krawczyk, Martin
    Gerkmann, Timo
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1931 - 1940
  • [6] Two-Stage Temporal Processing for Single-Channel Speech Enhancement
    Samui, Sunzan
    Chakrabarti, Indrajit
    Ghosh, Soumya Kanti
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3723 - 3727
  • [7] Weak Speech Recovery for Single-Channel Speech Enhancement
    Wong, Arthur
    Ming, Kok
    Low, Siow Yong
    [J]. 2012 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT AND ADVANCED SYSTEMS (ICIAS), VOLS 1-2, 2012, : 627 - 631
  • [8] Single-Channel Speech Enhancement With Phase Reconstruction Based on Phase Distortion Averaging
    Wakabayashi, Yukoh
    Fukumori, Takahiro
    Nakayama, Masato
    Nishiura, Takanobu
    Yamashita, Yoichi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1559 - 1569
  • [9] PHASE ESTIMATION IN SINGLE-CHANNEL SPEECH ENHANCEMENT USING PHASE INVARIANCE CONSTRAINTS
    Pirolt, Michael
    Stahl, Johannes
    Mowlaee, Pejman
    Vorobiov, Vasili I.
    Barysenka, Siarhei Y.
    Davydov, Andrew G.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5585 - 5589
  • [10] Phase Estimation in Single-Channel Speech Enhancement: Limits-Potential
    Mowlaee, Pejman
    Kulmer, Josef
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (08) : 1283 - 1294