A Universal VAD Based on Jointly Trained Deep Neural Networks

被引:0
|
作者
Wang, Qing [1 ]
Du, Jun [1 ]
Bao, Xiao [1 ]
Wang, Zi-Rui [1 ]
Dai, Li-Rong [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
voice activity detection; deep neural network; feature mapping; joint training; VOICE ACTIVITY DETECTION; SPEECH RECOGNITION; NOISE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a joint training approach to voice activity detection (VAD) to address the issue of performance degradation due to unseen noise conditions. Two key techniques are integrated into this deep neural network (DNN) based VAD framework. First, a regression DNN is trained to map the noisy to clean speech features similar to DNN-based speech enhancement. Second, the VAD part to discriminate speech against noise backgrounds is also a DNN trained with a large amount of diversified noisy data synthesized by a wide range of additive noise types. By stacking the classification DNN on top of the enhancement DNN, this integrated DNN can be jointly trained to perform VAD. The feature mapping DNN serves as a noise normalization module aiming at explicitly generating the "clean" features which are easier to be correctly recognized by the following classification DNN. Our experiment results demonstrate the proposed noise-universal DNN-based VAD algorithm achieves a good generalization capacity to unseen noises, and the jointly trained DNNs consistently and significantly outperform the conventional classification-based DNN for all the noise types and signal-to-noise levels tested.
引用
收藏
页码:2282 / 2286
页数:5
相关论文
共 50 条
  • [31] Optimization Techniques for Conversion of Quantization Aware Trained Deep Neural Networks to Lightweight Spiking Neural Networks
    Lee, Kyungchul
    Choi, Sunghyun
    Lew, Dongwoo
    Park, Jongsun
    2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,
  • [32] Predicting the outputs of finite deep neural networks trained with noisy gradients
    Naveh, Gadi
    Ben David, Oded
    Sompolinsky, Haim
    Ringel, Zohar
    PHYSICAL REVIEW E, 2021, 104 (06)
  • [33] Deep Directly-Trained Spiking Neural Networks for Object Detection
    Su, Qiaoyi
    Chou, Yuhong
    Hu, Yifan
    Li, Jianing
    Mei, Shijie
    Zhang, Ziyang
    Li, Guoqi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6532 - 6542
  • [34] Removing Neurons From Deep Neural Networks Trained With Tabular Data
    Klemetti, Antti
    Raatikainen, Mikko
    Kivimaki, Juhani
    Myllyaho, Lalli
    Nurminen, Jukka K.
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2024, 5 : 542 - 552
  • [35] OSNR Monitoring by Deep Neural Networks Trained with Asynchronously Sampled Data
    Tanimura, Takahito
    Hoshida, Takeshi
    Rasmussen, Jens C.
    Suzuki, Makoto
    Morikawa, Hiroyuki
    2016 21ST OPTOELECTRONICS AND COMMUNICATIONS CONFERENCE (OECC) HELD JOINTLY WITH 2016 INTERNATIONAL CONFERENCE ON PHOTONICS IN SWITCHING (PS), 2016,
  • [36] Deep neural networks watermark via universal deep hiding and metric learning
    Zhicheng Ye
    Xinpeng Zhang
    Guorui Feng
    Neural Computing and Applications, 2024, 36 : 7421 - 7438
  • [37] Deep neural networks watermark via universal deep hiding and metric learning
    Ye, Zhicheng
    Zhang, Xinpeng
    Feng, Guorui
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (13): : 7421 - 7438
  • [38] Universal backdoor attack on deep neural networks for malware detection
    Zhang, Yunchun
    Feng, Fan
    Liao, Zikun
    Li, Zixuan
    Yao, Shaowen
    APPLIED SOFT COMPUTING, 2023, 143
  • [39] Deep Neural Networks, Generic Universal Interpolation, and Controlled ODEs
    Cuchiero, Christa
    Larsson, Martin
    Teichmann, Josef
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2020, 2 (03): : 901 - 919
  • [40] GRAPH EXPANSIONS OF DEEP NEURAL NETWORKS AND THEIR UNIVERSAL SCALING LIMITS
    Cirone, Nicola Muça
    Hamdan, Jad
    Salvi, Cristopher
    arXiv,