Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection

被引:7
|
作者
Fujimoto, Masakiyo [1 ]
Watanabe, Shinji [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp 2 4, NTT Commun Sci Labs, Seika, Kyoto 6190237, Japan
关键词
Voice activity detection; Switching Kalman filter; Gaussian pruning; Posterior probability; Gaussian weight normalization;
D O I
10.1016/j.specom.2011.08.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a robust voice activity detection (VAD) method that operates in the presence of noise. For noise robust VAD, we have already proposed statistical models and a switching Kalman filter (SKF)-based technique. In this paper, we focus on a model re-estimation method using Gaussian pruning with weight normalization. The statistical model for SKF-based VAD is constructed using Gaussian mixture models (GMMs), and consists of pre-trained silence and clean speech GMMs and a sequentially estimated noise GMM. However, the composed model is not optimal in that it does not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed model, we investigate a method for its re-estimation that reflects the characteristics of the observed signal sequence. Since our VAD method works through the use of frame-wise sequential processing, processing with the smallest latency is very important. In this case, there are insufficient re-training data for a re-estimation of all the Gaussian parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable characteristics using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions that express the local characteristics of each frame and by normalizing the Gaussian weights of the remaining distributions. In an experiment using a speech corpus for VAD evaluation, CENSREC-1-C, the proposed method significantly improved the VAD performance with compared that of the original SKF-based VAD. This result confirmed that the proposed Gaussian pruning contributes to an improvement in VAD accuracy. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:229 / 244
页数:16
相关论文
共 5 条
  • [1] Voice Activity Detection Using Frame-Wise Model Re-Estimation Method Based on Gaussian Pruning with Weight Normalization
    Fujimoto, Masakiyo
    Watanabe, Shinji
    Nakatani, Tomohiro
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 3102 - 3105
  • [2] Noise robust model-based Voice Activity Detection
    de la Torre, Angel
    Ramirez, Javier
    Benitez, Carmen
    Segura, Jose C.
    Garcia, Luz
    Rubio, Antonio J.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1954 - 1957
  • [3] Robust voice activity detection algorithm based on complex Gaussian mixture model
    Lei, Jian-Jun
    Yang, Zhen
    Liu, Gang
    Guo, Jun
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2009, 42 (04): : 353 - 356
  • [4] A Normalized Kurtosis based Voice Activity Detection Method under Non-Gaussian Non-stationary Noise
    Zhang, Yi
    Li, Xiaomei
    Zhang, Yuxia
    2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 1, 2011, : 426 - 429
  • [5] Noise robust voice activity detection based on statistical model and parallel non-linear Kalman filtering
    Fujimoto, Masakiyo
    Ishizuka, Kentaro
    Kato, Hiroko
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 797 - +