LONG-TERM AUTO-CORRELATION STATISTICS BASED VOICE ACTIVITY DETECTION FOR STRONG NOISY SPEECH

被引:0
|
作者
Shi, Wei [1 ,2 ]
Zou, Yuexian [2 ]
Liu, Yi [1 ]
机构
[1] PKU HKUST Shenzhen HongKong Inst, Shenzhen Key Lab Intelligent Media & Speech, Shenzhen, Peoples R China
[2] Peking Univ, Sch Elect Comp Engn, ADSPLAB ELIP, Shenzhen, Peoples R China
关键词
long-term auto-correlation statistics; voice activity detection; strong noisy speech; SIGNAL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a voice activity detection (VAD) algorithm based on a novel long-term metric. By assuming that the most significant difference between noisy speech and non-speech is the harmonicity of the noisy speech spectrum caused by human nature, the long-term autocorrelation statistics (LTACS) measure is designed to be shown as a powerful metric used in VAD. The LTACS measure is calculated among several successive frames around the concerned frame and it represents the significance of harmonics of the signal spectrum over a long term rather than a short term. A novel LTACS-based VAD algorithm is derived by jointly making use of the minimum operator to reduce non-speech variability and of then calculating variance to detect speech. Simulative comparisons with four standardized VAD algorithms (ETSI adaptive multi-rate option 1 and 2, ETSI advanced front-end and G.729 Annex B) as well as three former proposed VAD algorithms show that the proposed LT ACS-based VAD algorithm achieves the best performance under all SNR conditions, especially in strong noisy environments (e.g., SNR is -5dB or -1 OdB).
引用
收藏
页码:100 / 104
页数:5
相关论文
共 50 条
  • [1] Auto-correlation Property of Speech and Its Application in Voice Activity Detection
    Zhang Shuyin
    Guo Ying
    Wang Buhong
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL III, 2009, : 265 - 268
  • [2] A Novel Instantaneous Frequency-based Voice Activity Detection for Strong Noisy Speech
    Shi, Wei
    Zou, Yuexian
    [J]. PROCEEDING OF THE IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2012, : 956 - 959
  • [3] Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
    Wu, Haixu
    Xu, Jiehui
    Wang, Jianmin
    Long, Mingsheng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Long-term speech information based threshold for voice activity detection in massive microphone network
    Zhu, Mengyao
    Wu, Xiukun
    Lu, Zhihua
    Wang, Tao
    Zhu, Xiaoqiang
    [J]. DIGITAL SIGNAL PROCESSING, 2019, 94 : 156 - 164
  • [5] Efficient voice activity detection algorithms using long-term speech information
    Ramírez, J
    Segura, JC
    Benítez, C
    de la Torre, A
    Rubio, A
    [J]. SPEECH COMMUNICATION, 2004, 42 (3-4) : 271 - 287
  • [6] Adaptive Voice Activity Detection Based on Long-Term Information
    Yang X.-K.
    Qu D.
    Zhang W.-L.
    Yan H.-G.
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2018, 46 (04): : 878 - 885
  • [7] Voice Pathology Detection Using Auto-Correlation of Different Filters Bank
    Al-nasheri, Ahmed
    Ali, Zulfiqar
    Muhammad, Ghulam
    Alsulaiman, Mansour
    [J]. 2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 50 - 55
  • [8] A Long-Term Traffic Flow Prediction Model Based on Variational Mode Decomposition and Auto-Correlation Mechanism
    Guo, Kaixin
    Yu, Xin
    Liu, Gaoxiang
    Tang, Shaohu
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (12):
  • [9] Long-Term Spectral Statistics for Voice Presentation Attack Detection
    Muckenhirn, Hannah
    Korshunov, Pavel
    Magimai-Doss, Mathew
    Marcel, Sebastien
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (11) : 2098 - 2111
  • [10] Robust Voice Activity Detection Algorithm for Noisy Speech
    Verteletskaya, Ekaterina
    Simak, Boris
    [J]. RTT 2009: 11TH INTERNATIONAL CONFERENCE RTT 2009 RESEARCH IN TELECOMMUNICATION TECHNOLOGY, CONFERENCE PROCEEDINGS, 2009, : 98 - 101