A robust and lightweight voice activity detection algorithm for speech enhancement at low signal-to-noise ratio

被引:6
|
作者
Zhu, Zhehui [1 ]
Zhang, Lijun [1 ]
Pei, Kaikun [1 ]
Chen, Siqi [1 ]
机构
[1] Tongji Univ, Sch Automot Studies, Shanghai 201804, Peoples R China
关键词
Voice activity detection; Noise robust; Speech enhancement; Hybrid feature; Machine learning; FILTER;
D O I
10.1016/j.dsp.2023.104151
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Voice Activity Detection (VAD) is a crucial component of Speech Enhancement (SE) for accurately estimating noise, which directly affects the SE effectiveness in improving speech quality. However, conventional non-data-driven VADs often suffer from decreased accuracy at a low signal-to-noise ratio (SNR). To address this issue, a multi-feature and cosine similarity-based multi-observation VAD algorithm (mVAD) are proposed in this study. This algorithm selects noise-robust features, with Mel-frequency Cepstral Coefficients (MFCCs) as the main features, and utilizes several optimization techniques and an adaptive threshold for background noise updating. Furthermore, the soft VAD results are smoothed with an improved exponential moving average (EMA) algorithm. Besides, a shifting window is utilized to track the mean value and obtain an adaptive threshold for converting the soft results to binary ones. Experimental results indicate that mVAD can maintain high classification accuracy down to-10 dB with an increment of approximately 28% while also being computationally efficient for the CPU time (about 1/3 of statistical model-based methods). It also maintained high robustness at SNRs less than 0 dB (& UDelta; & LE; 2.1%). Moreover, sometimes mVAD even achieved higher accuracy levels than deep learning-based VADs. To further demonstrate the effectiveness of the proposed method, the VAD results are used as an additional feature to train and test a neural network (NN)-based SE model, enhancing the SE performance. This study proves that mVAD does not rely on prior noise knowledge, reaching the dual effect of complexity reduction and accuracy improvement for speech enhancement, making it a promising approach for robust VAD in low SNR environments. & COPY; 2023 Elsevier Inc. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Robust Joint Accumulation and Detection for Discrete Frequency Coded Waveform Signals at Low Signal-to-Noise Ratio
    Wei, Song
    Zhang, Lei
    Ma, Yan
    Zhong, Weijun
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (03) : 977 - 986
  • [42] THE MEASUREMENT OF THE SIGNAL-TO-NOISE RATIO (SNR) IN CONTINUOUS SPEECH
    KLINGHOLZ, F
    SPEECH COMMUNICATION, 1987, 6 (01) : 15 - 26
  • [43] SIGNAL-TO-NOISE RATIO AS A PREDICTOR OF SPEECH TRANSMISSION QUALITY
    SEN, TK
    CARROLL, JD
    IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1973, AU21 (04): : 384 - 387
  • [44] A SUPERVISED SIGNAL-TO-NOISE RATIO ESTIMATION OF SPEECH SIGNALS
    Papadopoulos, Pavlos
    Tsiartas, Andreas
    Gibson, James
    Narayanan, Shrikanth
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [45] SIGNAL-TO-NOISE RATIO AND THE DETECTION OF DETAIL IN NONWHITE NOISE
    WOLF, M
    PHOTOGRAPHIC SCIENCE AND ENGINEERING, 1980, 24 (02): : 99 - 103
  • [46] Research on the Enhancement Method of Small Moving Targets with Low Signal-to-Noise Ratio
    Li, Kun
    Piao, Yupeng
    Qian, Weixian
    AOPC 2022: OPTICAL SENSING, IMAGING, AND DISPLAY TECHNOLOGY, 2022, 12557
  • [47] Research on the Enhancement Method of Small Moving Targets with Low Signal-to-Noise Ratio
    Li, Kun
    Piao, Yupeng
    Qian, Weixian
    INTERNATIONAL CONFERENCE ON OPTICAL AND PHOTONIC ENGINEERING, ICOPEN 2022, 2022, 12550
  • [48] Single-Channel Speech Enhancement Algorithm Based on ME-MGCRN in Low Signal-to-Noise Scenario
    Lan, Chaofeng
    Zhao, Shilong
    Chen, Huan
    Zhang, Lei
    Yang, Yuchen
    Fan, Zixu
    Zhang, Meng
    IEEE ACCESS, 2024, 12 : 101342 - 101355
  • [49] Robust voice activity detection algorithm for estimating noise spectrum
    Woo, KH
    Yang, TY
    Park, KJ
    Lee, C
    ELECTRONICS LETTERS, 2000, 36 (02) : 180 - 181
  • [50] Robust Signal-to-Noise Ratio Constrained Feedback Control
    Rojas, A. J.
    2014 AMERICAN CONTROL CONFERENCE (ACC), 2014, : 5651 - 5656