Enhancement of speech dynamics for voice activity detection using DNN

被引:0
|
作者
Suci Dwijayanti
Kei Yamamori
Masato Miyoshi
机构
[1] Graduate School of Natural Science and Technology,
[2] Kanazawa University,undefined
[3] Kakuma Campus,undefined
[4] Department of Electrical Engineering,undefined
[5] Universitas Sriwijaya,undefined
关键词
Voice activity detection; Dynamics; Speech period candidates; Deep neural network;
D O I
暂无
中图分类号
学科分类号
摘要
Voice activity detection (VAD) is an important preprocessing step for various speech applications to identify speech and non-speech periods in input signals. In this paper, we propose a deep neural network (DNN)-based VAD method for detecting such periods in noisy signals using speech dynamics, which are time-varying speech signals that may be expressed as the first- and second-order derivatives of mel cepstra, also known as the delta and delta-delta features. Unlike these derivatives, in this paper, the dynamics are highlighted by speech period candidates, which are calculated based on heuristic rules for the patterns of the first and second derivatives of the input signals. These candidates, together with the log power spectra, are input into the DNN to obtain VAD decisions. In this study, experiments are conducted to compare the proposed method with a DNN-based method, which exclusively utilizes log power spectra by using speech signals smeared with five types of noise (white, babble, factory, car, and pink) with signal-to-noise ratios (SNRs) of 10, 5, 0, and − 5 dB. The experimental results show that the proposed method is superior under all the considered noise conditions, indicating that the speech period candidates improve the log power spectra.
引用
收藏
相关论文
共 50 条
  • [1] Enhancement of speech dynamics for voice activity detection using DNN
    Dwijayanti, Suci
    Yamamori, Kei
    Miyoshi, Masato
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [2] Voice Activity Detection for Speech Enhancement Applications
    Verteletskaya, E.
    Sakhnov, K.
    [J]. ACTA POLYTECHNICA, 2010, 50 (04) : 100 - 105
  • [3] DNN-BASED VOICE ACTIVITY DETECTION USING AUXILIARY SPEECH MODELS IN NOISY ENVIRONMENTS
    Tachioka, Yuuki
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5529 - 5533
  • [4] A unified approach to speech enhancement and voice activity detection
    Kasap, Ceyhan
    Arslan, Mustafa Levent
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2013, 21 (02) : 527 - 547
  • [5] Speech enhancement through voice activity detection using speech absence probability based on Teager energy
    Yun-sik Park
    Sang-min Lee
    [J]. Journal of Central South University, 2013, 20 : 424 - 432
  • [6] Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy for Speech Enhancement
    Park, Yun-Sik
    Lee, Sangmin
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10) : 2568 - 2571
  • [7] Speech enhancement through voice activity detection using speech absence probability based on Teager energy
    Park, Yun-sik
    Lee, Sang-min
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2013, 20 (02) : 424 - 432
  • [8] Speech enhancement through voice activity detection using speech absence probability based on Teager energy
    PARKYun-sik
    LEE Sang-min
    [J]. Journal of Central South University, 2013, 20 (02) : 424 - 432
  • [9] Gaussian Process Regression for Voice Activity Detection and Speech Enhancement
    Park, Sunho
    Choi, Seungjin
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 2879 - 2882
  • [10] A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement
    Zhang, Yan
    Tang, Zhen-min
    Li, Yan-ping
    Luo, Yang
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,