Noise robust voice activity detection using joint phase and magnitude based feature enhancement

被引:0
|
作者
Khomdet Phapatanaburi
Longbiao Wang
Zeyan Oo
Weifeng Li
Seiichi Nakagawa
Masahiro Iwahashi
机构
[1] Nagaoka University of Technology,Tianjin Key Laboratory of Cognitive Computing and Application
[2] School of Computer Science and Technology,Graduate School at Shenzhen
[3] Tianjin University,undefined
[4] Tsinghua University,undefined
[5] Toyohashi University of Technology,undefined
关键词
Deep neural network (DNN); Phase information; Noise-robust VAD; Feature enhancement;
D O I
暂无
中图分类号
学科分类号
摘要
Recently, deep neural network (DNN)-based feature enhancement has been proposed for many speech applications. DNN-enhanced features have achieved higher performance than raw features. However, phase information is discarded during most conventional DNN training. In this paper, we propose a DNN-based joint phase- and magnitude -based feature (JPMF) enhancement (JPMF with DNN) and a noise-aware training (NAT)-DNN-based JPMF enhancement (JPMF with NAT-DNN) for noise-robust voice activity detection (VAD). Moreover, to improve the performance of the proposed feature enhancement, a combination of the scores of the proposed phase- and magnitude-based features is also applied. Specifically, mel-frequency cepstral coefficients (MFCCs) and the mel-frequency delta phase (MFDP) are used as magnitude and phase features. The experimental results show that the proposed feature enhancement significantly outperforms the conventional magnitude-based feature enhancement. The proposed JPMF with NAT-DNN method achieves the best relative equal error rate (EER), compared with individual magnitude- and phase-based DNN speech enhancement. Moreover, the combined score of the enhanced MFCC and MFDP using JPMF with NAT-DNN further improves the VAD performance.
引用
收藏
页码:845 / 859
页数:14
相关论文
共 50 条
  • [31] A robust voice activity detection based on wavelet transform
    Aghajani, Kh.
    Manzuri, M. T.
    Karami, M.
    Tayebi, H.
    2008 SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, 2008, : 37 - +
  • [32] Formant-Based Robust Voice Activity Detection
    Yoo, In-Chul
    Lim, Hyeontaek
    Yook, Dongsuk
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2238 - 2245
  • [33] Robust Voice Activity Detection Algorithm Based on Feature of Frequency Modulation of Harmonics and Its DSP Implementation
    Hsu, Chung-Chien
    Cheong, Kah-Meng
    Chi, Tai-Shih
    Tsao, Yu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (10): : 1808 - 1817
  • [34] Speech waveform compression using robust adaptive voice activity detection for nonstationary noise in multimedia communications
    Syed, Waheeduddin Q.
    Wu, Hsiao-Chun
    GLOBECOM 2007: 2007 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-11, 2007, : 3096 - 3101
  • [35] Voice Activity Detection based on Support Vector Machine using Effective Feature Vectors
    Jo, Q-Haing
    Park, Yun-Sik
    Lee, Kye-Hwan
    Song, Ji-Hyun
    Chang, Joon-Hyuk
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 981 - 984
  • [36] Robust noise detection for speech detection and enhancement
    Garner, NR
    Barrett, PA
    Howard, DM
    Tyrrell, AM
    ELECTRONICS LETTERS, 1997, 33 (04) : 270 - 271
  • [37] Voice Activity Detection Based on SVM Classifier Using Likelihood Ratio Feature Vector
    Jo, Q-Haing
    Chang, Joon-Hyuk
    Kang, Sangki
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2007, 26 (08): : 397 - 402
  • [38] Speaker-Dependent Voice Activity Detection Robust to Background Speech Noise
    Matsuda, Shigeki
    Ito, Naoya
    Tsujino, Kosuke
    Kashioka, Hideki
    Sagayama, Shigeki
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2625 - 2628
  • [39] Multi-Task Joint-Learning for Robust Voice Activity Detection
    Zhuang, Yimeng
    Tong, Sibo
    Yin, Maofan
    Qian, Yanmin
    Yu, Kai
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [40] Robust voice activity detection using group delay functions
    Krishnan, Sree Hari P.
    Padmanabhan, R.
    Murthy, Heina A.
    2006 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1-6, 2006, : 1704 - +