Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition

被引:26
|
作者
Novotny, Ondrej [1 ]
Plchot, Oldrich [1 ]
Glembek, Ondrej [1 ]
Cernocky, Jan ''Honza'' [1 ]
Burget, Lukas [1 ]
机构
[1] Brno Univ Technol, Speech FIT & Ctr Excellence IT4I, Bozetechova 2, Brno 61266, Czech Republic
来源
基金
美国国家科学基金会;
关键词
Speaker verification; Signal enhancement; Autoencoder; Neural network; Robustness; Embedding; SCORE NORMALIZATION;
D O I
10.1016/j.csl.2019.06.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present an analysis of a DNN-based autoencoder for speech enhancement, dereverberation and denoising. The target application is a robust speaker verification (SV) system. We start our approach by carefully designing a data augmentation process to cover a wide range of acoustic conditions and to obtain rich training data for various components of our SV system. We augment several well-known databases used in SV with artificially noised and reverberated data and we use them to train a denoising autoencoder (mapping noisy and reverberated speech to its clean version) as well as an x-vector extractor which is currently considered as state-of-the-art in SV. Later, we use the autoencoder as a preprocessing step for a text-independent SV system. We compare results achieved with autoencoder enhancement, multi-condition PLDA training and their simultaneous use. We present a detailed analysis with various conditions of NIST SRE 2010, 2016, PRISM and with re-transmitted data. We conclude that the proposed preprocessing can significantly improve both i-vector and x-vector baselines and that this technique can be used to build a robust SV system for various target domains. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:403 / 421
页数:19
相关论文
共 50 条
  • [1] ROBUST SPEAKER RECOGNITION BASED ON DNN/I-VECTORS AND SPEECH SEPARATION
    Chang, Jorge
    Wang, DeLiang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5415 - 5419
  • [2] ON COMBINING DNN AND GMM WITH UNSUPERVISED SPEAKER ADAPTATION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Liu, Shilin
    Sim, Khe Chai
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Feature enhancement by speaker-normalized splice for robust speech recognition
    Shinohara, Yusuke
    Masuko, Takashi
    Akamine, Masami
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4881 - 4884
  • [4] Assessment of signal subspace based speech enhancement for noise robust speech recognition
    Hermus, K
    Wambacq, P
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 945 - 948
  • [5] X-VECTORS: ROBUST DNN EMBEDDINGS FOR SPEAKER RECOGNITION
    Snyder, David
    Garcia-Romero, Daniel
    Sell, Gregory
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5329 - 5333
  • [6] Feature and Signal Enhancement for Robust Speaker Identification of G.729 Decoded Speech
    Raval, Kalpesh
    Ramachandran, Ravi P.
    Shetty, Sachin S.
    Smolenski, Brett Y.
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2012, PT V, 2012, 7667 : 345 - 352
  • [7] MULTILEVEL SPEECH INTELLIGIBILITY FOR ROBUST SPEAKER RECOGNITION
    Nemala, Sridhar Krishna
    Elhilali, Mounya
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4393 - 4396
  • [8] Speaker and Noise Factorization for Robust Speech Recognition
    Wang, Yongqiang
    Gales, Mark J. F.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (07): : 2149 - 2158
  • [9] A novel robust feature of speech signal based on the mellin transform for speaker-independent speech recognition
    Chen, JD
    Xu, B
    Huang, TY
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 629 - 632
  • [10] A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition
    Kris Hermus
    Patrick Wambacq
    Hugo Van hamme
    [J]. EURASIP Journal on Advances in Signal Processing, 2007