Noise estimation based on time-frequency correlation for speech enhancement

被引:5
|
作者
Yuan, Wenhao [1 ]
Lin, Jiajun [1 ]
An, Wei [1 ]
Wang, Yu [1 ]
Chen, Ning [1 ]
机构
[1] E China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
基金
中国国家自然科学基金;
关键词
Noise estimation; Speech enhancement; Minimum search; Improved Minima Controlled Recursive; Averaging; ENVIRONMENTS; RECOGNITION;
D O I
10.1016/j.apacoust.2012.11.007
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As a fundamental part of speech enhancement, noise estimation is particularly challenging in highly nonstationary noise environments. In this work, we propose an effective algorithm on the basis of the "Improved Minima Controlled Recursive Averaging (IMCRA)" with the objective to improve the performance of noise estimation. The main contributions of this work are: (i) in the algorithm, a rough decision about speech presence is proposed by calculating the autocorrelation and cross-channel correlation of the T-F (Time-Frequency) units; (ii) with this decision, we refine the smoothing parameters for the smoothing of noisy power spectrum and the recursive averaging in noise spectrum estimation as well as the weighting factor for the a priori SNR (Signal to Noise Ratio) estimation in the IMCRA; (iii) we improve the search of local minima during spectral bursts by adding a minimum search with a shorter window. Extensive experiments are carried out to evaluate the performance of our proposed algorithm. The experimental results illustrate that, compared with the IMCRA, the proposed approach significantly improves the accuracy of noise spectrum estimation and the quality of enhanced speech in the typical noise situations. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:770 / 781
页数:12
相关论文
共 50 条
  • [21] Integrated speech enhancement and coding in the time-frequency domain
    Drygajlo, A
    Carnero, B
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1183 - 1186
  • [22] Adaptive time-frequency data fusion for speech enhancement
    Shi, G
    Aarabi, P
    Lazic, N
    [J]. FUSION 2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE OF INFORMATION FUSION, VOLS 1 AND 2, 2003, : 394 - 399
  • [23] A time-frequency smoothing neural network for speech enhancement
    Yuan, Wenhao
    [J]. SPEECH COMMUNICATION, 2020, 124 : 75 - 84
  • [24] Speech preprocessing and enhancement based on joint time domain and time-frequency domain analysis
    Zhang, Wenbo
    Xie, Xuefeng
    Du, Yanling
    Huang, Dongmei
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2024, 155 (06): : 3580 - 3588
  • [25] INVERTIBLE DNN-BASED NONLINEAR TIME-FREQUENCY TRANSFORM FOR SPEECH ENHANCEMENT
    Lakeuchi, Daiki
    Yatabe, Kohei
    Koizumi, Yuma
    Oikawa, Yasuhiro
    Harada, Noboru
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6644 - 6648
  • [26] Improved a posteriori Speech Presence Probability Estimation Based on Cepstro-Temporal Smoothing and Time-Frequency Correlation
    Li, Chao
    Liu, Wenju
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1208 - 1211
  • [27] Noise-robust automatic speech recognition using Mainlobe-Resilient time-frequency quantile-based noise estimation
    Lee, SW
    Ching, PC
    Lee, T
    [J]. 2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 3, PROCEEDINGS, 2004, : 425 - 428
  • [28] Measuring time-frequency importance functions of speech with bubble noise
    Mandel, Michael I.
    Yoho, Sarah E.
    Healy, Eric W.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (04): : 2542 - 2553
  • [29] Evaluation of the importance of time-frequency contributions to speech intelligibility in noise
    Yu, Chengzhu
    Wojcicki, Kamil K.
    Loizou, Philipos C.
    Hansen, John H. L.
    Johnson, Michael T.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 135 (05): : 3007 - 3016
  • [30] Time-frequency mask estimation-based speech enhancement using deep encoder-decoder neural network
    SHI Wenhua
    ZHANG Xiongwei
    ZOU Xia
    SUN Meng
    LI Li
    REN Zhengbing
    [J]. Chinese Journal of Acoustics, 2021, 40 (01) : 141 - 154