SNR-Invariant Multitask Deep Neural Networks for Robust Speaker Verification

被引:7
|
作者
Yao, Qi [1 ]
Mak, Man-Wai [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Deep learning; i-vectors; multitask learning; noise robustness; speaker verification; NOISE; PLDA;
D O I
10.1109/LSP.2018.2870726
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A major challenge in speaker verification is to achieve low error rates under noisy environments. We observed that background noise in utterances will not only enlarge the speakerdependent i-vector clusters but also shift the clusters, with the amount of shift depending on the signal-to-noise ratio (SNR) of the utterances. To overcome this SNR-dependent clustering phenomenon, we propose two deep neural network (DNN) architectures: hierarchical regression DNN (H-RDNN) and multitask DNN (MT-DNN). The H-RDNN is formed by stacking two regression DNNs in which the lower DNN is trained to map noisy i-vectors to their respective speaker-dependent cluster means of clean i-vectors and the upper DNN aims to regularize the outliers that cannot be denoised properly by the lower DNN. The MT-DNN is trained to denoise i-vectors (main task) and classify speakers (auxiliary task). The network leverages the auxiliary task to retain speaker information in the denoised i-vectors. Experimental results suggest that these two DNN architectures together with the PLDA backend significantly outperform the multicondition PLDA model and mixtures of PLDA, and that multitask learning helps to boost verification performance.
引用
收藏
页码:1670 / 1674
页数:5
相关论文
共 50 条
  • [41] Affect Classification in Tweets using Multitask Deep Neural Networks
    Nagar, Seema
    Shankhdhar, Achintya
    Barbhuiya, Ferdous Ahmed
    Dey, Kuntal
    WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 516 - 520
  • [42] Multiview Multitask Gaze Estimation With Deep Convolutional Neural Networks
    Lian, Dongze
    Hu, Lina
    Luo, Weixin
    Xu, Yanyu
    Duan, Lixin
    Yu, Jingyi
    Gao, Shenghua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (10) : 3010 - 3023
  • [43] Mixture of Auto-Associative Neural Networks for Speaker Verification
    Sivaram, G. S. V. S.
    Thomas, Samuel
    Hermansky, Hynek
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2392 - +
  • [44] Multi-task learning of deep neural networks for joint automatic speaker verification and spoofing detection
    Li, Jiakang
    Sun, Meng
    Zhang, Xiongwei
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1517 - 1522
  • [45] Regularized Auto-Associative Neural Networks for Speaker Verification
    Sri Garimella
    Mallidi, Harish
    Hermansky, Hynek
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (12) : 841 - 844
  • [46] Speaker verification for security systems using artificial neural networks
    Vieira, K
    Wilamowski, B
    Kubichek, R
    IECON '97 - PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON INDUSTRIAL ELECTRONICS, CONTROL, AND INSTRUMENTATION, VOLS. 1-4, 1997, : 1102 - 1107
  • [47] Utilization of age information for speaker verification using multi-task learning deep neural networks
    Kim, Ju-ho
    Heo, Hee-Soo
    Jung, Jee-weon
    Shim, Hye-jin
    Kim, Seung-Bin
    Yu, Ha-Jin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (05): : 593 - 600
  • [48] A COMPLETE END-TO-END SPEAKER VERIFICATION SYSTEM USING DEEP NEURAL NETWORKS: FROM RAW SIGNALS TO VERIFICATION RESULT
    Jung, Jee-Weon
    Heo, Hee-Soo
    Yang, Il-Ho
    Shim, Hye-Jin
    Yu, Ha-Jin
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5349 - 5353
  • [49] GENERATIVE ADVERSARIAL SPEAKER EMBEDDING NETWORKS FOR DOMAIN ROBUST END-TO-END SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Monteiro, Joao
    Alam, Jahangir
    Kenny, Patrick
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6226 - 6230
  • [50] SPEAKER ADAPTATION OF CONTEXT DEPENDENT DEEP NEURAL NETWORKS
    Liao, Hank
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7947 - 7951