Phase Processing for Single-Channel Speech Enhancement

被引:176
|
作者
Gerkmann, Timo [1 ,2 ,3 ]
Krawczyk-Becker, Martin [1 ]
Le Roux, Jonathan [4 ,5 ]
机构
[1] Siemens Corp Res, Princeton, NJ USA
[2] Royal Inst Technol, Stockholm, Sweden
[3] Carl von Ossietzky Univ Oldenburg, D-26111 Oldenburg, Germany
[4] Mitsubishi Elect Res Labs, Cambridge, MA USA
[5] Nippon Telegraph & Tel Commun Sci Labs, Kyoto, Japan
关键词
SPECTRAL MAGNITUDE ESTIMATION; TIME FOURIER-TRANSFORM; SIGNAL ESTIMATION; VOCODER; AUDIO;
D O I
10.1109/MSP.2014.2369251
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the advancement of technology, both assisted listening devices and speech communication devices are becoming more portable and also more frequently used. As a consequence, users of devices such as hearing aids, cochlear implants, and mobile telephones, expect their devices to work robustly anywhere and at any time. This holds in particular for challenging noisy environments like a cafeteria, a restaurant, a subway, a factory, or in traffic. One way to making assisted listening devices robust to noise is to apply speech enhancement algorithms. To improve the corrupted speech, spatial diversity can be exploited by a constructive combination of microphone signals (so-called beamforming), and by exploiting the different spectro-temporal properties of speech and noise. Here, we focus on single-channel speech enhancement algorithms which rely on spectrotemporal properties. On the one hand, these algorithms can be employed when the miniaturization of devices only allows for using a single microphone. On the other hand, when multiple microphones are available, single-channel algorithms can be employed as a postprocessor at the output of a beamformer. To exploit the short-term stationary properties of natural sounds, many of these approaches process the signal in a time-frequency representation, most frequently the short-time discrete Fourier transform (STFT) domain. In this domain, the coefficients of the signal are complex-valued, and can therefore be represented by their absolute value (referred to in the literature both as STFT magnitude and STFT amplitude) and their phase. While the modeling and processing of the STFT magnitude has been the center of interest in the past three decades, phase has been largely ignored. In this article, we review the role of phase processing for speech enhancement in the context of assisted listening and speech communication devices. We explain why most of the research conducted in this field used to focus on estimating spectral magnitudes in the STFT domain, and why recently phase processing is attracting increasing interest in the speech enhancement community. Furthermore, we review both early and recent methods for phase processing in speech enhancement. We aim to show that phase processing is an exciting field of research with the potential to make assisted listening and speech communication devices more robust in acoustically challenging environments.
引用
收藏
页码:55 / 66
页数:12
相关论文
共 50 条
  • [41] Modified Amplitude Spectral Estimator for Single-Channel Speech Enhancement
    Zhai, Zhenhui
    Ou, Shifeng
    Gao, Ying
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN MECHANICAL ENGINEERING AND INDUSTRIAL INFORMATICS (AMEII 2016), 2016, 73 : 1115 - 1120
  • [42] SPEAKER AND NOISE INDEPENDENT ONLINE SINGLE-CHANNEL SPEECH ENHANCEMENT
    Germain, Francois G.
    Mysore, Gautham J.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 71 - 75
  • [43] Deep Neural Network for Supervised Single-Channel Speech Enhancement
    Saleem, Nasir
    Irfan Khattak, Muhammad
    Ali, Muhammad Yousaf
    Shafi, Muhammad
    [J]. ARCHIVES OF ACOUSTICS, 2019, 44 (01) : 3 - 12
  • [44] INVESTIGATION OF A PARAMETRIC GAIN APPROACH TO SINGLE-CHANNEL SPEECH ENHANCEMENT
    Huang, Gongping
    Chen, Jingdong
    Benesty, Jacob
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 206 - 210
  • [45] SINGLE-CHANNEL SPEECH ENHANCEMENT WITH SEQUENTIALLY TRAINED DNN SYSTEM
    Sun, Yang
    Xian, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    [J]. 2019 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2019,
  • [46] A PROBABILISTIC APPROACH FOR PHASE ESTIMATION IN SINGLE-CHANNEL SPEECH ENHANCEMENT USING VON MISES PHASE PRIORS
    Kulmer, Josef
    Mowlaee, Pejman
    Watanabe, Mario Kaoru
    [J]. 2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2014,
  • [47] Improved Phase Reconstruction in Single-Channel Speech Separation
    Mayer, Florian
    Mowlaee, Pejman
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1795 - 1799
  • [48] SINGLE-CHANNEL SPEECH ENHANCEMENT IN A TRANSIENT NOISE ENVIRONMENT BY EXPLOITING SPEECH HARMONICITY
    Wu, Kai
    Reju, V. G.
    Khong, Andy W. H.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5088 - 5092
  • [49] Phase-Sensitive Decision-Directed SNR Estimator for Single-Channel Speech Enhancement
    Ou, Shifeng
    Song, Peng
    Gao, Ying
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2017, 31 (08)
  • [50] Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering
    Dionelis, Nikolaos
    Brookes, Mike
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (05) : 937 - 950