Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement

被引:0
|
作者
Song, Yanjue [1 ]
Madhu, Nilesh [1 ]
机构
[1] Univ Ghent, IDLab, imec, B-9000 Ghent, Belgium
关键词
speech enhancement; speech envelope estimation; GRU; CRNN; PRIORI SNR ESTIMATION; SPECTRAL ENVELOPE; EXCITATION;
D O I
10.3390/s23146438
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Using the source-filter model of speech production, clean speech signals can be decomposed into an excitation component and an envelope component that is related to the phoneme being uttered. Therefore, restoring the envelope of degraded speech during speech enhancement can improve the intelligibility and quality of output. As the number of phonemes in spoken speech is limited, they can be adequately represented by a correspondingly limited number of envelopes. This can be exploited to improve the estimation of speech envelopes from a degraded signal in a data-driven manner. The improved envelopes are then used in a second stage to refine the final speech estimate. Envelopes are typically derived from the linear prediction coefficients (LPCs) or from the cepstral coefficients (CCs). The improved envelope is obtained either by mapping the degraded envelope onto pre-trained codebooks (classification approach) or by directly estimating it from the degraded envelope (regression approach). In this work, we first investigate the optimal features for envelope representation and codebook generation by a series of oracle tests. We demonstrate that CCs provide better envelope representation compared to using the LPCs. Further, we demonstrate that a unified speech codebook is advantageous compared to the typical codebook that manually splits speech and silence as separate entries. Next, we investigate low-complexity neural network architectures to map degraded envelopes to the optimal codebook entry in practical systems. We confirm that simple recurrent neural networks yield good performance with a low complexity and number of parameters. We also demonstrate that with a careful choice of the feature and architecture, a regression approach can further improve the performance at a lower computational cost. However, as also seen from the oracle tests, the benefit of the two-stage framework is now chiefly limited by the statistical noise floor estimate, leading to only a limited improvement in extremely adverse conditions. This highlights the need for further research on joint estimation of speech and noise for optimum enhancement.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A two-stage algorithm for enhancement of reverberant speech
    Wu, MY
    Wang, D
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1085 - 1088
  • [2] A TWO-STAGE ALGORITHM FOR NOISY AND REVERBERANT SPEECH ENHANCEMENT
    Zhao, Yan
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5580 - 5584
  • [3] TWO-STAGE SPEECH ENHANCEMENT USING GATED CONVOLUTIONS
    Thieling, Lars
    Jax, Peter
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [4] TWO-STAGE SPEECH ENHANCEMENT WITH MANIPULATION OF THE CEPSTRAL EXCITATION
    Elshamy, Samy
    Madhu, Nilesh
    Tirry, Wouter
    Fingscheidt, Tim
    [J]. 2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 106 - 110
  • [5] A two-stage method for single-channel speech enhancement
    Hamid, ME
    Fukabayashi, T
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2006, E89A (04) : 1058 - 1068
  • [6] Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication
    Li, Junfeng
    Sakamoto, Shuichi
    Hongo, Satoshi
    Akagi, Masato
    Suzuki, Yoiti
    [J]. SPEECH COMMUNICATION, 2011, 53 (05) : 677 - 689
  • [7] Speech Enhancement Using a Two-Stage Network for an Efficient Boosting Strategy
    Kim, Juntae
    Hahn, Minsoo
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (05) : 770 - 774
  • [8] A two-stage algorithm for one-microphone reverberant speech enhancement
    Wu, MY
    Wang, DL
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 774 - 784
  • [9] Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement
    Zhao, Yan
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 53 - 62
  • [10] Two-Stage Temporal Processing for Single-Channel Speech Enhancement
    Samui, Sunzan
    Chakrabarti, Indrajit
    Ghosh, Soumya Kanti
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3723 - 3727