DNN-Based Calibrated-Filter Models for Speech Enhancement

被引:3
|
作者
Attabi, Yazid [1 ]
Champagne, Benoit [1 ]
Zhu, Wei-Ping [2 ]
机构
[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 0E9, Canada
[2] Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ H3G 1M8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Speech enhancement; Wiener filter gain function; Gain calibration; Multi-taper spectral analysis; Deep neural network (DNN); Non-negative matrix factorization (NMF); NONNEGATIVE MATRIX FACTORIZATION; NOISE; MASKING;
D O I
10.1007/s00034-020-01604-6
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we present a new two-stage speech enhancement approach, specially conceived to reduce musical and other random noises without requiring their localization in the time-frequency domain. The proposed method is motivated by two observations: (1) the random scattering nature of the energy peaks corresponding to the musical noise in the spectrogram of the processed speech; and (2) the existence of correlation between Wiener filter gains calculated at different frequencies. In the first stage of the proposed method, a preliminary gain function is generated using the nonnegative matrix factorization algorithm. In the second stage, a modified gain function that is more robust to noise artefacts, and referred to as calibrated filter, is estimated by applying a DNN-based nonlinear mapping function to the preliminary gain function. To further decrease the variability of the estimated calibrated filter, we propose to expand the DNN-based extraction of frequency dependencies to a set of preliminary gain functions derived from spectral estimates based on a family of data tapers; the resulting calibrated filter is referred to as multi-filter. The evaluation of the proposed DNN-based calibrated filter models for speech enhancement, under different noise types and input SNR levels, shows substantial improvements in terms of standard speech quality and intelligibility measures when compared to uncalibrated filter.
引用
收藏
页码:2926 / 2949
页数:24
相关论文
共 50 条
  • [31] Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech
    Fujimura, Takuya
    Koizumi, Yuma
    Yatabe, Kohei
    Miyazaki, Ryoichi
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 436 - 440
  • [32] A low-computational DNN-based speech enhancement for hearing aids based on element selection
    Haruta, Chiho
    Ono, Nobutaka
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1025 - 1029
  • [33] A DNN-based emotional speech synthesis by speaker adaptation
    Yang, Hongwu
    Zhang, Weizhao
    Zhi, Pengpeng
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637
  • [34] A DNN-based Post Filter for Geometric Source Separation
    Chen, Chenghao
    Zhou, Yi
    Liu, Hongqing
    [J]. 2018 INTERNATIONAL SEMINAR ON COMPUTER SCIENCE AND ENGINEERING TECHNOLOGY (SCSET 2018), 2019, 1176
  • [35] DNN-BASED SPEECH QUALITY ASSESSMENT FOR BINAURAL SIGNALS
    Reimes, Jan
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [36] DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
  • [37] Prediction of speech intelligibility with DNN-based performance measures
    Martinez, Angel Mario Castro
    Spille, Constantin
    Rossbach, Jana
    Kollmeier, Birger
    Meyer, Bernd T.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [38] DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation
    Houidhek, Amal
    Colotte, Vincent
    Mnasri, Zied
    Jouvet, Denis
    [J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 9 - 20
  • [39] DNN-BASED SPEECH MASK ESTIMATION FOR EIGENVECTOR BEAMFORMING
    Pfeifenberger, Lukas
    Zoehrer, Matthias
    Pernkopf, Franz
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 66 - 70
  • [40] DNN-BASED SPEECH PRESENCE PROBABILITY ESTIMATION FORMULTI-FRAME SINGLE-MICROPHONE SPEECH ENHANCEMENT
    Tammen, Marvin
    Fischer, Doerte
    Meyer, Bernd T.
    Doclo, Simon
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 191 - 195