Noise robust voice activity detection using joint phase and magnitude based feature enhancement

被引:0
|
作者
Khomdet Phapatanaburi
Longbiao Wang
Zeyan Oo
Weifeng Li
Seiichi Nakagawa
Masahiro Iwahashi
机构
[1] Nagaoka University of Technology,Tianjin Key Laboratory of Cognitive Computing and Application
[2] School of Computer Science and Technology,Graduate School at Shenzhen
[3] Tianjin University,undefined
[4] Tsinghua University,undefined
[5] Toyohashi University of Technology,undefined
关键词
Deep neural network (DNN); Phase information; Noise-robust VAD; Feature enhancement;
D O I
暂无
中图分类号
学科分类号
摘要
Recently, deep neural network (DNN)-based feature enhancement has been proposed for many speech applications. DNN-enhanced features have achieved higher performance than raw features. However, phase information is discarded during most conventional DNN training. In this paper, we propose a DNN-based joint phase- and magnitude -based feature (JPMF) enhancement (JPMF with DNN) and a noise-aware training (NAT)-DNN-based JPMF enhancement (JPMF with NAT-DNN) for noise-robust voice activity detection (VAD). Moreover, to improve the performance of the proposed feature enhancement, a combination of the scores of the proposed phase- and magnitude-based features is also applied. Specifically, mel-frequency cepstral coefficients (MFCCs) and the mel-frequency delta phase (MFDP) are used as magnitude and phase features. The experimental results show that the proposed feature enhancement significantly outperforms the conventional magnitude-based feature enhancement. The proposed JPMF with NAT-DNN method achieves the best relative equal error rate (EER), compared with individual magnitude- and phase-based DNN speech enhancement. Moreover, the combined score of the enhanced MFCC and MFDP using JPMF with NAT-DNN further improves the VAD performance.
引用
收藏
页码:845 / 859
页数:14
相关论文
共 50 条
  • [41] Robust Voice Activity Detection Using Gammatone Filtering and Entropy
    Ong, W. Q.
    Tan, A. W. C.
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ROBOTICS, AUTOMATION AND SCIENCES (ICORAS 2016), 2016,
  • [42] Robust Voice Activity Detection Using Selectively Energy Features
    Wakasugi, Junichiro
    Hayasaka, Noboru
    Iiguni, Youji
    2014 21ST IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2014, : 359 - 362
  • [43] Enhancement of speech dynamics for voice activity detection using DNN
    Dwijayanti, Suci
    Yamamori, Kei
    Miyoshi, Masato
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [44] DNN-based Feature Enhancement using Joint Training Framework for Robust Multichannel Speech Recognition
    Lee, Kang Hyun
    Kang, Tae Gyoon
    Kang, Woo Hyun
    Kim, Nam Soo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3027 - 3031
  • [45] DNN-based feature enhancement using joint training framework for robust multichannel speech recognition
    Lee, Kang Hyun
    Kang, Tae Gyoon
    Kang, Woo Hyun
    Kim, Nam Soo
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2016, 08-12-September-2016 : 3027 - 3031
  • [46] Enhancement of speech dynamics for voice activity detection using DNN
    Suci Dwijayanti
    Kei Yamamori
    Masato Miyoshi
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [47] Robust voice-activity detection based on the wavelet transform
    Stegmann, J
    Schroder, G
    1997 IEEE WORKSHOP ON SPEECH CODING FOR TELECOMMUNICATIONS, PROCEEDINGS: BACK TO BASICS: ATTACKING FUNDAMENTAL PROBLEMS IN SPEECH CODING, 1997, : 99 - 100
  • [48] Noise robust isolated word recognition using speech feature enhancement techniques
    Ecole Nationale d'Ingénieurs de Sfax ENIS, Department of Génie Electrique, BP W, 3038 Sfax, Tunisia
    不详
    J. Appl. Sci., 2007, 24 (3935-3942):
  • [49] An RNN and CRNN Based Approach to Robust Voice Activity Detection
    Wang, Guan-Bo
    Zhang, Wei-Qiang
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1347 - 1350
  • [50] Voice Activity Detection Based on Augmented Statistical Noise Suppression
    Obuchi, Yasunari
    Takeda, Ryu
    Kanda, Naoyuki
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,