DeepLPC: A Deep Learning Approach to Augmented Kalman Filter-Based Single-Channel Speech Enhancement

被引:10
|
作者
Roy, Sujan Kumar [1 ]
Nicolson, Aaron [2 ]
Paliwal, Kuldip K. [1 ]
机构
[1] Griffith Univ, Signal Proc Lab, Nathan Campus, Brisbane, Qld 4111, Australia
[2] CSIRO, Australian E Hlth Res Ctr, Herston, Qld 4006, Australia
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Speech enhancement; Noise measurement; Deep learning; Signal to noise ratio; Distortion; Kalman filters; Estimation; Kalman filter; augmented Kalman filter; deep neural networks; temporal convolutional network; LPC; COLORED-NOISE; QUALITY; MASKING;
D O I
10.1109/ACCESS.2021.3075209
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC is compared to six existing deep learning-based methods. Compared to other deep learning approaches to clean speech LPC estimation, DeepLPC produces a lower spectral distortion (SD) level than existing methods, confirming that it exhibits less bias. DeepLPC also produced higher objective scores than any of the competing methods (with an improvement of 0.11 for CSIG, 0.15 for CBAK, 0.14 for COVL, 0.13 for PESQ, 2.66% for STOI, 1.11 dB for SegSNR, and 1.05 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC was also the most preferred by 10 listeners. By producing less biased clean speech and noise LPC estimates, DeepLPC enables the AKF to produce enhanced speech at a higher quality and intelligibility.
引用
收藏
页码:64524 / 64538
页数:15
相关论文
共 50 条
  • [1] Deep Learning with Augmented Kalman Filter for Single-Channel Speech Enhancement
    Roy, Sujan Kumar
    Nicolson, Aaron
    Paliwal, Kuldip K.
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [2] DeepLPC-MHANet: Multi-Head Self-Attention for Augmented Kalman Filter-Based Speech Enhancement
    Roy, Sujan Kumar
    Nicolson, Aaron
    Paliwal, Kuldip K.
    [J]. IEEE ACCESS, 2021, 9 : 70516 - 70530
  • [3] Robustness and sensitivity metrics-based tuning of the augmented Kalman filter for single-channel speech enhancement
    Roy, Sujan Kumar
    Paliwal, Kuldip K.
    [J]. APPLIED ACOUSTICS, 2022, 185
  • [4] Deep Learning Models for Single-Channel Speech Enhancement on Drones
    Mukhutdinov, Dmitrii
    Alex, Ashish
    Cavallaro, Andrea
    Wang, Lin
    [J]. IEEE ACCESS, 2023, 11 : 22993 - 23007
  • [5] On supervised LPC estimation training targets for augmented Kalman filter-based speech enhancement
    Roy, Sujan Kumar
    Nicolson, Aaron
    Paliwal, Kuldip K.
    [J]. SPEECH COMMUNICATION, 2022, 142 : 49 - 60
  • [6] A Deep Learning-based Kalman Filter for Speech Enhancement
    Roy, Sujan Kumar
    Nicolson, Aaron
    Paliwal, Kuldip K.
    [J]. INTERSPEECH 2020, 2020, : 2692 - 2696
  • [7] Deep Residual Network-Based Augmented Kalman Filter for Speech Enhancement
    Roy, Sujan Kumar
    Paliwal, Kuldip K.
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 667 - 673
  • [8] A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network
    [J]. Zhang, Xiong-Wei (xwzhang9898@163.com), 2017, Science Press (43):
  • [9] A HYBRID APPROACH TO COMBINING CONVENTIONAL AND DEEP LEARNING TECHNIQUES FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION
    Tu, Yan-Hui
    Tashev, Ivan
    Zarar, Shuayb
    Lee, Chin-Hui
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2531 - 2535
  • [10] Iterative and sequential Kalman filter-based speech enhancement algorithms
    Gannot, S
    Burshtein, D
    Weinstein, E
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (04): : 373 - 385