Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition

被引:26
|
作者
Novotny, Ondrej [1 ]
Plchot, Oldrich [1 ]
Glembek, Ondrej [1 ]
Cernocky, Jan ''Honza'' [1 ]
Burget, Lukas [1 ]
机构
[1] Brno Univ Technol, Speech FIT & Ctr Excellence IT4I, Bozetechova 2, Brno 61266, Czech Republic
来源
基金
美国国家科学基金会;
关键词
Speaker verification; Signal enhancement; Autoencoder; Neural network; Robustness; Embedding; SCORE NORMALIZATION;
D O I
10.1016/j.csl.2019.06.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present an analysis of a DNN-based autoencoder for speech enhancement, dereverberation and denoising. The target application is a robust speaker verification (SV) system. We start our approach by carefully designing a data augmentation process to cover a wide range of acoustic conditions and to obtain rich training data for various components of our SV system. We augment several well-known databases used in SV with artificially noised and reverberated data and we use them to train a denoising autoencoder (mapping noisy and reverberated speech to its clean version) as well as an x-vector extractor which is currently considered as state-of-the-art in SV. Later, we use the autoencoder as a preprocessing step for a text-independent SV system. We compare results achieved with autoencoder enhancement, multi-condition PLDA training and their simultaneous use. We present a detailed analysis with various conditions of NIST SRE 2010, 2016, PRISM and with re-transmitted data. We conclude that the proposed preprocessing can significantly improve both i-vector and x-vector baselines and that this technique can be used to build a robust SV system for various target domains. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:403 / 421
页数:19
相关论文
共 50 条
  • [31] Overview of speech enhancement techniques for automatic speaker recognition
    OrtegaGarcia, J
    GonzalezRodriguez, J
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 929 - 932
  • [32] Signal enhancement for continuous speech recognition
    Athanaselis, T
    Fotinea, SE
    Bakamidis, S
    Dologlou, I
    Giannopoulos, G
    [J]. ARTIFICAIL NEURAL NETWORKS AND NEURAL INFORMATION PROCESSING - ICAN/ICONIP 2003, 2003, 2714 : 1117 - 1124
  • [33] On the Complementary Role of DNN Multi-Level Enhancement for Noisy Robust Speaker Recognition in an I-Vector Framework
    Zhang, Xingyu
    Zou, Xia
    Sun, Meng
    Wu, Penglong
    Wang, Yimin
    He, Jun
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (01) : 356 - 360
  • [34] Multisource Speech Analysis for Speaker Recognition
    Sorokin, V. N.
    Leonov, A. S.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2019, 29 (01) : 181 - 193
  • [35] Compensation of speech enhancement distortion for robust speech recognition
    Ding, P
    Cao, ZG
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452
  • [36] Robust distributed speech recognition using speech enhancement
    Flynn, Ronan
    Jones, Edward
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) : 1267 - 1273
  • [37] Multisource Speech Analysis for Speaker Recognition
    V. N. Sorokin
    A. S. Leonov
    [J]. Pattern Recognition and Image Analysis, 2019, 29 : 181 - 193
  • [38] SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION IN MOTORCYCLE ENVIRONMENT
    Mporas, Iosif
    Ganchev, Todor
    Kocsis, Otilia
    Fakotakis, Nikos
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2010, 19 (02) : 159 - 173
  • [39] Robust recognition of noisy speech using speech enhancement
    Xu, YF
    Zhang, JJ
    Yao, KS
    Cao, ZG
    Ma, ZX
    [J]. 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 734 - 737
  • [40] Contaminated speech training methods for robust DNN-HMM distant speech recognition
    Ravanelli, Mirco
    Omologo, Maurizio
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 756 - 760