A robust and lightweight voice activity detection algorithm for speech enhancement at low signal-to-noise ratio

被引:6
|
作者
Zhu, Zhehui [1 ]
Zhang, Lijun [1 ]
Pei, Kaikun [1 ]
Chen, Siqi [1 ]
机构
[1] Tongji Univ, Sch Automot Studies, Shanghai 201804, Peoples R China
关键词
Voice activity detection; Noise robust; Speech enhancement; Hybrid feature; Machine learning; FILTER;
D O I
10.1016/j.dsp.2023.104151
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Voice Activity Detection (VAD) is a crucial component of Speech Enhancement (SE) for accurately estimating noise, which directly affects the SE effectiveness in improving speech quality. However, conventional non-data-driven VADs often suffer from decreased accuracy at a low signal-to-noise ratio (SNR). To address this issue, a multi-feature and cosine similarity-based multi-observation VAD algorithm (mVAD) are proposed in this study. This algorithm selects noise-robust features, with Mel-frequency Cepstral Coefficients (MFCCs) as the main features, and utilizes several optimization techniques and an adaptive threshold for background noise updating. Furthermore, the soft VAD results are smoothed with an improved exponential moving average (EMA) algorithm. Besides, a shifting window is utilized to track the mean value and obtain an adaptive threshold for converting the soft results to binary ones. Experimental results indicate that mVAD can maintain high classification accuracy down to-10 dB with an increment of approximately 28% while also being computationally efficient for the CPU time (about 1/3 of statistical model-based methods). It also maintained high robustness at SNRs less than 0 dB (& UDelta; & LE; 2.1%). Moreover, sometimes mVAD even achieved higher accuracy levels than deep learning-based VADs. To further demonstrate the effectiveness of the proposed method, the VAD results are used as an additional feature to train and test a neural network (NN)-based SE model, enhancing the SE performance. This study proves that mVAD does not rely on prior noise knowledge, reaching the dual effect of complexity reduction and accuracy improvement for speech enhancement, making it a promising approach for robust VAD in low SNR environments. & COPY; 2023 Elsevier Inc. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Enhancement of signal-to-noise ratio during the detection of weak video pulses
    Makarenko A.S.
    Radioelectronics and Communications Systems, 2010, 53 (10) : 566 - 568
  • [32] ENHANCEMENT OF SIGNAL-TO-NOISE RATIO IN MAGNETOTELLURIC DATA
    KAO, DW
    RANKIN, D
    GEOPHYSICS, 1977, 42 (01) : 103 - 110
  • [33] A Robust Voice Activity Detection Algorithm in Nonstationary Noise
    Lei, Jianjun
    Yang, Jiachen
    Wang, Jian
    Yang, Zhen
    2009 INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS, PROCEEDINGS, 2009, : 195 - +
  • [34] THE SIGNAL-TO-NOISE RATIO WITH DIODE DETECTION
    KHISHAM, A
    TELECOMMUNICATIONS AND RADIO ENGINEERING, 1989, 44 (07) : 63 - 66
  • [35] Signal-to-noise ratio enhancement of cyclic summation
    McMahon, D
    Bolton, A
    ISSPA 96 - FOURTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 1996, : 710 - 713
  • [36] ON THE SIGNAL-TO-NOISE RATIO ENHANCEMENT OF THE DOPPLER PROCESS
    AMIR, I
    NEWHOUSE, VL
    ULTRASONICS, 1984, 22 (05) : 231 - 239
  • [37] Signal-to-Noise Ratio Enhancement by Accumulation of Signal and Noise along the Spectrum
    Lebedev, Igor
    Dmitriyeva, Elena
    Bondar, Ekaterina
    Ibraimova, Sayora
    Fedosimova, Anastasiya
    Temiraliev, Abzal
    FLUCTUATION AND NOISE LETTERS, 2022, 21 (02):
  • [38] Speech Endpoint Detection Algorithm with Low Signal-to-Noise Based on Improved Conventional Spectral Entropy
    Zhang, Yi
    Wang, Kejia
    Yan, Bo
    PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 3307 - 3311
  • [39] Information Approach to Signal-to-Noise Ratio Estimation of the Speech Signal
    Gai, Vasiliy
    INFORMATION TECHNOLOGIES AND MATHEMATICAL MODELLING, 2014, 487 : 137 - 144
  • [40] Research on Speech Endpoint Detection under Low Signal-to-Noise Ratios
    Han Zhiyan
    Wang Jian
    2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2015, : 3635 - 3639