A robust and lightweight voice activity detection algorithm for speech enhancement at low signal-to-noise ratio

被引:6
|
作者
Zhu, Zhehui [1 ]
Zhang, Lijun [1 ]
Pei, Kaikun [1 ]
Chen, Siqi [1 ]
机构
[1] Tongji Univ, Sch Automot Studies, Shanghai 201804, Peoples R China
关键词
Voice activity detection; Noise robust; Speech enhancement; Hybrid feature; Machine learning; FILTER;
D O I
10.1016/j.dsp.2023.104151
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Voice Activity Detection (VAD) is a crucial component of Speech Enhancement (SE) for accurately estimating noise, which directly affects the SE effectiveness in improving speech quality. However, conventional non-data-driven VADs often suffer from decreased accuracy at a low signal-to-noise ratio (SNR). To address this issue, a multi-feature and cosine similarity-based multi-observation VAD algorithm (mVAD) are proposed in this study. This algorithm selects noise-robust features, with Mel-frequency Cepstral Coefficients (MFCCs) as the main features, and utilizes several optimization techniques and an adaptive threshold for background noise updating. Furthermore, the soft VAD results are smoothed with an improved exponential moving average (EMA) algorithm. Besides, a shifting window is utilized to track the mean value and obtain an adaptive threshold for converting the soft results to binary ones. Experimental results indicate that mVAD can maintain high classification accuracy down to-10 dB with an increment of approximately 28% while also being computationally efficient for the CPU time (about 1/3 of statistical model-based methods). It also maintained high robustness at SNRs less than 0 dB (& UDelta; & LE; 2.1%). Moreover, sometimes mVAD even achieved higher accuracy levels than deep learning-based VADs. To further demonstrate the effectiveness of the proposed method, the VAD results are used as an additional feature to train and test a neural network (NN)-based SE model, enhancing the SE performance. This study proves that mVAD does not rely on prior noise knowledge, reaching the dual effect of complexity reduction and accuracy improvement for speech enhancement, making it a promising approach for robust VAD in low SNR environments. & COPY; 2023 Elsevier Inc. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] An impulse noise robust voice activity detection algorithm applied for low signal-to-noise ratio digital communication
    Wang, Tong
    Cui, Hui-juan
    Tang, Kun
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 2225 - +
  • [2] Enhancement algorithm for low signal to noise ratio speech
    Li, Ye
    Wang, Tong
    Cui, Hui-Juan
    Tang, Kun
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2007, 29 (09): : 2054 - 2057
  • [3] Voice Activity Detection Algorithm with Low Signal-to-Noise Ratios Based on Spectrum Entropy
    Wang, Kun-Ching
    Tasi, Yi-Hsing
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 423 - 428
  • [4] A radar echo signal detection algorithm in low signal-to-noise ratio
    Li, Xiangju
    PROCEEDINGS OF THE 2016 3RD INTERNATIONAL CONFERENCE ON MATERIALS ENGINEERING, MANUFACTURING TECHNOLOGY AND CONTROL, 2016, 67 : 349 - 353
  • [5] UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition
    Hao, Xiang
    Su, Xiangdong
    Wang, Zhiyu
    Zhang, Hui
    Batushiren
    INTERSPEECH 2019, 2019, : 1786 - 1790
  • [6] Improved signal-to-noise ratio estimation for speech enhancement
    Plapous, Cyril
    Marro, Claude
    Scalart, Pascal
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (06): : 2098 - 2108
  • [7] A pitch detection method for speech signals with low signal-to-noise ratio
    Shahnaz, C.
    Zhu, W. -P.
    Ahmad, M. O.
    2007 INTERNATIONAL SYMPOSIUM ON SIGNALS, SYSTEMS AND ELECTRONICS, VOLS 1 AND 2, 2007, : 386 - 389
  • [8] Unsupervised voice activity detection with improved signal-to-noise ratio in noisy environment
    Sharma, Shilpa
    Malhotra, Rahul
    Sharma, Anurag
    Bala, Jeevan
    Rattan, Punam
    Vashisht, Sheveta
    INTERNATIONAL JOURNAL OF NANOTECHNOLOGY, 2023, 20 (1-4) : 421 - 432
  • [9] A formant frequency estimation algorithm for speech signals with low signal-to-noise ratio
    Fattah, S. A.
    Zhu, W. -P.
    Ahmad, M. O.
    2007 50TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-3, 2007, : 81 - 84
  • [10] Enhancement of Signal-to-Noise Ratio
    Dhara, A. K.
    Journal of Statistical Physics, 87 (1-2):