IMPROVING SPEAKER RECOGNITION PERFORMANCE IN THE DOMAIN ADAPTATION CHALLENGE USING DEEP NEURAL NETWORKS

被引:0
|
作者
Garcia-Romero, Daniel [1 ]
Zhang, Xiaohui
McCree, Alan
Povey, Daniel
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
关键词
Unsupervised adaptation; speaker recognition; i-vectors; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional i-vector speaker recognition systems use a Gaussian mixture model (GMM) to collect sufficient statistics (SS). Recently, replacing this GMM with a deep neural network (DNN) has shown promising results. In this paper, we explore the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC). We show that collecting SS with a DNN trained on out-of-domain data boosts the speaker recognition performance of an out-of-domain system by more than 25%. Moreover, we integrate the DNN in an unsupervised adaptation framework, that uses agglomerative hierarchical clustering with a stopping criterion based on unsupervised calibration, and show that the initial gains of the out-of-domain system carry over to the final adapted system. Despite the fact that the DNN is trained on the out-of-domain data, the final adapted system produces a relative improvement of more than 30% with respect to the best published results on this task.
引用
收藏
页码:378 / 383
页数:6
相关论文
共 50 条
  • [31] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
    Li, Sheng
    Lu, Xugang
    Akita, Yuya
    Kawahara, Tatsuya
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
  • [32] SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Lu, Xugang
    Hori, Chiori
    Katagiri, Shigeru
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [33] On robustness of unsupervised domain adaptation for speaker recognition
    Bousquet, Pierre-Michel
    Rouvier, Mickael
    INTERSPEECH 2019, 2019, : 2958 - 2962
  • [34] A deep learning approach to integrate convolutional neural networks in speaker recognition
    Hourri, Soufiane
    Nikolov, Nikola S.
    Kharroubi, Jamal
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 615 - 623
  • [35] A deep learning approach to integrate convolutional neural networks in speaker recognition
    Soufiane Hourri
    Nikola S. Nikolov
    Jamal Kharroubi
    International Journal of Speech Technology, 2020, 23 : 615 - 623
  • [36] Speaker Emotion Recognition: From Classical Classifiers To Deep Neural Networks
    Mezghani, Eya
    Charfeddine, Maha
    Nicolas, Henri
    Ben Amar, Chokri
    TENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2017), 2018, 10696
  • [37] DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION
    Zhao, Yong
    Li, Jinyu
    Zhang, Shixiong
    Chen, Liping
    Gong, Yifan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5984 - 5988
  • [38] Automatic Recognition of Speaker Age and Gender Based on Deep Neural Networks
    Markitantov, Maxim
    Verkholyak, Oxana
    SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 327 - 336
  • [39] Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
    Cyrta, Pawel
    Trzcinski, Tomasz
    Stokowiec, Wojciech
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 107 - 117
  • [40] Improving training datasets for resource-constrained speaker recognition neural networks
    Bousquet, Pierre-Michel
    Rouvier, Mickael
    INTERSPEECH 2023, 2023, : 3167 - 3171