Deep Belief Networks Based Voice Activity Detection

被引:247
|
作者
Zhang, Xiao-Lei [1 ]
Wu, Ji [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Multimedia Signal & Intelligent Informat Proc Lab, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
基金
中国博士后科学基金;
关键词
Deep learning; information fusion; voice activity detection; STATISTICAL-MODEL; MULTIPITCH TRACKING; ALGORITHM; CLASSIFICATION; SEGREGATION; NOISY;
D O I
10.1109/TASL.2012.2229986
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Fusing the advantages of multiple acoustic features is important for the robustness of voice activity detection (VAD). Recently, the machine-learning-based VADs have shown a superiority to traditional VADs on multiple feature fusion tasks. However, existing machine-learning-based VADs only utilize shallow models, which cannot explore the underlying manifold of the features. In this paper, we propose to fuse multiple features via a deep model, called deep belief network (DBN). DBN is a powerful hierarchical generative model for feature extraction. It can describe highly variant functions and discover the manifold of the features. We take the multiple serially-concatenated features as the input layer of DBN, and then extract a new feature by transferring these features through multiple nonlinear hidden layers. Finally, we predict the class of the new feature by a linear classifier. We further analyze that even a single-hidden-layer-based belief network is as powerful as the state-of-the-art models in the machine-learning-based VADs. In our empirical comparison, ten common features are used for performance analysis. Extensive experimental results on the AURORA2 corpus show that the DBN-based VAD not only outperforms eleven referenced VADs, but also can meet the real-time detection demand of VAD. The results also show that the DBN-based VAD can fuse the advantages of multiple features effectively.
引用
收藏
页码:697 / 710
页数:14
相关论文
共 50 条
  • [1] Voice activity detection based on deep belief networks using likelihood ratio
    Sang-Kyun Kim
    Young-Jin Park
    Sangmin Lee
    [J]. Journal of Central South University, 2016, 23 : 145 - 149
  • [2] Voice activity detection based on deep belief networks using likelihood ratio
    Kim, Sang-Kyun
    Park, Young-Jin
    Lee, Sangmin
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2016, 23 (01) : 145 - 149
  • [3] Voice activity detection based on deep belief networks using likelihood ratio
    KIM Sang-Kyun
    PARK Young-Jin
    LEE Sangmin
    [J]. Journal of Central South University, 2016, 23 (01) : 145 - 149
  • [4] DENOISING DEEP NEURAL NETWORKS BASED VOICE ACTIVITY DETECTION
    Zhang, Xiao-Lei
    Wu, Ji
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 853 - 857
  • [5] Voice activity detection based on deep neural networks and Viterbi
    Bai, Liang
    Zhang, Zhen
    Hu, Jun
    [J]. 2017 2ND INTERNATIONAL SEMINAR ON ADVANCES IN MATERIALS SCIENCE AND ENGINEERING, 2017, 231
  • [6] Deep Neural Networks for Voice Activity Detection
    Mihalache, Serban
    Ivanov, Ioan-Alexandru
    Burileanu, Dragos
    [J]. 2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 191 - 194
  • [7] Computationally-efficient voice activity detection based on deep neural networks
    Xiong, Yan
    Berisha, Visar
    Chakrabarti, Chaitali
    [J]. 2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), 2021, : 64 - 69
  • [8] An Intrusion Detection System Based on Deep Belief Networks
    Belarbi, Othmane
    Khan, Aftab
    Carnelli, Pietro
    Spyridopoulos, Theodoros
    [J]. SCIENCE OF CYBER SECURITY, SCISEC 2022, 2022, 13580 : 377 - 392
  • [9] A Comparison of Boosted Deep Neural Networks for Voice Activity Detection
    Krishnakumar, Harshit
    Williamson, Donald S.
    [J]. 2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [10] Deep Belief Networks for Iris Recognition Based on Contour Detection
    Baqar, Mohtashim
    Ghani, Azfar
    Aftab, Azeem
    Arbab, Saira
    Yasin, Sajid
    [J]. 2016 INTERNATIONAL CONFERENCE ON OPEN SOURCE SYSTEMS AND TECHNOLOGIES (ICOSST), 2016, : 72 - 77