Variance based time-frequency mask estimation for unsupervised speech enhancement

被引:4
|
作者
Saleem, Nasir [1 ,2 ]
Khattak, Muhammad Irfan [2 ]
Witjaksono, Gunawan [3 ]
Ahmad, Gulzar [2 ]
机构
[1] Gomal Univ, Fac Engn & Technol, Dept Elect Engn, Dera Ismail Khan 29050, Pakistan
[2] Univ Engn & Technol, Dept Elect Engn, Peshawar 25000, Pakistan
[3] Univ Teknol PETRONAS, Dept Elect & Elect Engn, Seri Iskandar, Malaysia
关键词
A priori SNR estimation; Speech enhancement; Time-frequency masking; Variance-based features; Wiener gain; Intelligibility; Speech quality; NOISE-ESTIMATION; RESIDUAL NOISE; ACOUSTIC NOISE; BINARY; REDUCTION; ALGORITHM; NETWORKS; SPECTRUM;
D O I
10.1007/s11042-019-08032-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Variance based two dimensional time-frequency mask estimation for unsupervised speech enhancement is proposed to improve the speech quality and intelligibility by reducing the low-frequency residual noise distortion in the noisy speech signals. Unlike conventional speech enhancement methods, the proposed method is able to reduce the residual noise distortion by utilizing benefits of the less aggressive Wiener gain and variance based two dimensional time-frequency mask to establish a two-stage speech enhancement method. In the first stage, the less aggressive Wiener gain with modified a priori signal-to-noise (SNR) estimate is applied to the input noisy speech to obtain a reduced noise pre-processed speech signal. In the second stage, variance based features are extracted from the pre-processed speech and compared to a nonparametric adaptive threshold to construct a two dimensional time-frequency mask. The estimated mask is then applied to the pre-processed speech from the first stage to suppress the annoying residual noise distortion. A comparative performance study is included to demonstrate the effectiveness of the proposed method in various noisy conditions. The experimental results showed large improvements in terms of the perceptual evaluation of speech quality (PESQ), segmental SNR (SegSNR), residual noise distortion (BAK) and speech distortion (SIG) over that achieved with competing methods at different input SNRs. To measure the understanding of enhanced speech in different noisy conditions, short-time intelligibility prediction (STOI) is used which reinforced a better performance of the proposed method in terms of the speech intelligibility. The time-varying spectral analysis validated significant reduction of the residual noise components in the enhanced speech.
引用
收藏
页码:31867 / 31891
页数:25
相关论文
共 50 条
  • [21] Wavelet-Based Speech Enhancement Using Time-Frequency Adaptation
    Wang, Kun-Ching
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2009,
  • [22] Wavelet-Based Speech Enhancement Using Time-Frequency Adaptation
    Kun-Ching Wang
    [J]. EURASIP Journal on Advances in Signal Processing, 2009
  • [23] A CONVEX OPTIMIZATION APPROACH FOR TIME-FREQUENCY MASK ESTIMATION
    Bao, Feng
    Abdulla, Waleed H.
    [J]. 2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 31 - 35
  • [24] Joint Time-Frequency and Time Domain Learning for Speech Enhancement
    Tang, Chuanxin
    Luo, Chong
    Zhao, Zhiyuan
    Xie, Wenxuan
    Zeng, Wenjun
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3816 - 3822
  • [25] A Time-Frequency Attention Module for Neural Speech Enhancement
    Zhang, Qiquan
    Qian, Xinyuan
    Ni, Zhaoheng
    Nicolson, Aaron
    Ambikairajah, Eliathamby
    Li, Haizhou
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 462 - 475
  • [26] Adaptive time-frequency data fusion for speech enhancement
    Shi, G
    Aarabi, P
    Lazic, N
    [J]. FUSION 2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE OF INFORMATION FUSION, VOLS 1 AND 2, 2003, : 394 - 399
  • [27] Integrated speech enhancement and coding in the time-frequency domain
    Drygajlo, A
    Carnero, B
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1183 - 1186
  • [28] FH signal parameter blind estimation based on time-frequency variance clustering
    Zhang, Shengkui
    Yao, Zhicheng
    He, Min
    Fan, Zhiliang
    Yang, Jian
    [J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (08): : 1662 - 1667
  • [29] A time-frequency smoothing neural network for speech enhancement
    Yuan, Wenhao
    [J]. SPEECH COMMUNICATION, 2020, 124 : 75 - 84
  • [30] Speech preprocessing and enhancement based on joint time domain and time-frequency domain analysis
    Zhang, Wenbo
    Xie, Xuefeng
    Du, Yanling
    Huang, Dongmei
    [J]. Journal of the Acoustical Society of America, 1600, 155 (06): : 3580 - 3588