Joint Training ResCNN-based Voice Activity Detection with Speech Enhancement

被引:0
|
作者
Xu, Tianjiao [1 ]
Zhang, Hui [1 ]
Zhang, Xueliang [1 ]
机构
[1] Inner Mongolia Univ, Dept Comp Sci, Hohhot, Peoples R China
关键词
NEURAL-NETWORK;
D O I
10.1109/apsipaasc47483.2019.9023101
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Voice activity detection (VAD) is considered as a solved problem in noise-free condition, but it is still a challenging task in low signal-to-noise ratio (SNR) noisy conditions. Intuitively, reducing noise will improve the VAD. Therefore, in this study, we introduce a speech enhancement module to reduce noise. Specifically, a convolutional recurrent neural network (CRN) based encoder-decoder speech enhancement module is trained to reduce noise. Then the low-dimensional features code from its encoder together with the raw spectrum of noisy speech are feed into a deep residual convolutional neural network (ResCNN) based VAD module. The speech enhancement and VAD modules are connected and trained jointly. To balance the training speed of the two modules, an empirical dynamic gradient balance strategy is proposed. Experimental results show that the proposed joint-training method has obvious advantages in generalization ability.
引用
收藏
页码:1157 / 1162
页数:6
相关论文
共 50 条
  • [1] Voice Activity Detection for Speech Enhancement Applications
    Verteletskaya, E.
    Sakhnov, K.
    [J]. ACTA POLYTECHNICA, 2010, 50 (04) : 100 - 105
  • [2] A unified approach to speech enhancement and voice activity detection
    Kasap, Ceyhan
    Arslan, Mustafa Levent
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2013, 21 (02) : 527 - 547
  • [3] Speech recognition enhancement with statistical model-based voice activity detection
    Jarc, Bojan
    Babič, Rudolf
    [J]. Elektrotehniski Vestnik/Electrotechnical Review, 2002, 69 (01): : 75 - 81
  • [4] Gaussian Process Regression for Voice Activity Detection and Speech Enhancement
    Park, Sunho
    Choi, Seungjin
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 2879 - 2882
  • [5] Enhancement of speech dynamics for voice activity detection using DNN
    Dwijayanti, Suci
    Yamamori, Kei
    Miyoshi, Masato
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [6] A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement
    Zhang, Yan
    Tang, Zhen-min
    Li, Yan-ping
    Luo, Yang
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,
  • [7] Enhancement of speech dynamics for voice activity detection using DNN
    Suci Dwijayanti
    Kei Yamamori
    Masato Miyoshi
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [8] Speech enhancement through voice activity detection using speech absence probability based on Teager energy
    Park, Yun-sik
    Lee, Sang-min
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2013, 20 (02) : 424 - 432
  • [9] Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy for Speech Enhancement
    Park, Yun-Sik
    Lee, Sangmin
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10) : 2568 - 2571
  • [10] Speech enhancement through voice activity detection using speech absence probability based on Teager energy
    Yun-sik Park
    Sang-min Lee
    [J]. Journal of Central South University, 2013, 20 : 424 - 432