SNR-Invariant Multitask Deep Neural Networks for Robust Speaker Verification

被引:7
|
作者
Yao, Qi [1 ]
Mak, Man-Wai [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Deep learning; i-vectors; multitask learning; noise robustness; speaker verification; NOISE; PLDA;
D O I
10.1109/LSP.2018.2870726
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A major challenge in speaker verification is to achieve low error rates under noisy environments. We observed that background noise in utterances will not only enlarge the speakerdependent i-vector clusters but also shift the clusters, with the amount of shift depending on the signal-to-noise ratio (SNR) of the utterances. To overcome this SNR-dependent clustering phenomenon, we propose two deep neural network (DNN) architectures: hierarchical regression DNN (H-RDNN) and multitask DNN (MT-DNN). The H-RDNN is formed by stacking two regression DNNs in which the lower DNN is trained to map noisy i-vectors to their respective speaker-dependent cluster means of clean i-vectors and the upper DNN aims to regularize the outliers that cannot be denoised properly by the lower DNN. The MT-DNN is trained to denoise i-vectors (main task) and classify speakers (auxiliary task). The network leverages the auxiliary task to retain speaker information in the denoised i-vectors. Experimental results suggest that these two DNN architectures together with the PLDA backend significantly outperform the multicondition PLDA model and mixtures of PLDA, and that multitask learning helps to boost verification performance.
引用
收藏
页码:1670 / 1674
页数:5
相关论文
共 50 条
  • [31] Text-Independent Speaker Verification Based on Deep Neural Networks and Segmental Dynamic TimeWarping
    Adel, Mohamed
    Afify, Mohamed
    Gaballah, Akram
    Fayek, Magda
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1001 - 1006
  • [32] Multi channel far field speaker verification using teacher student deep neural networks
    Jung, Jee-weon
    Heo, Hee-Soo
    Shim, Hye-jin
    Yu, Ha-Jin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2018, 37 (06): : 483 - 488
  • [33] Segment unit shuffling layer in deep neural networks for text-independent speaker verification
    Heo, Jungwoo
    Shim, Hye-jin
    Kim, Ju-ho
    Yu, Ha-Jin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (02): : 148 - 154
  • [34] Insights into Deep Neural Networks for Speaker Recognition
    Garcia-Romero, Daniel
    McCree, Alan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1141 - 1145
  • [35] DEEP NEURAL NETWORKS FOR COCHANNEL SPEAKER IDENTIFICATION
    Zhao, Xiaojia
    Wang, Yuxuan
    Wang, DeLiang
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4824 - 4828
  • [36] Empowering Speaker Verification with Deep Convolutional Neural Network Vectors
    Hourri, Soufiane
    STUDIES IN INFORMATICS AND CONTROL, 2024, 33 (02): : 97 - 107
  • [37] Safety Verification of Deep Neural Networks
    Huang, Xiaowei
    Kwiatkowska, Marta
    Wang, Sen
    Wu, Min
    COMPUTER AIDED VERIFICATION, CAV 2017, PT I, 2017, 10426 : 3 - 29
  • [38] Formal Verification of Deep Neural Networks
    Narodytska, Nina
    PROCEEDINGS OF THE 2018 18TH CONFERENCE ON FORMAL METHODS IN COMPUTER AIDED DESIGN (FMCAD), 2018, : 1 - 1
  • [39] ROBUST PITCH TRACKING IN NOISY SPEECH USING SPEAKER-DEPENDENT DEEP NEURAL NETWORKS
    Liu, Yuzhou
    Wane, DeLiang
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5255 - 5259
  • [40] A ROBUST TEXT-INDEPENDENT SPEAKER VERIFICATION METHOD BASED ON SPEECH SEPARATION AND DEEP SPEAKER
    Zhao, Fei
    Li, Hao
    Zhang, Xueliang
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6101 - 6105