IMPROVING SPEAKER RECOGNITION PERFORMANCE IN THE DOMAIN ADAPTATION CHALLENGE USING DEEP NEURAL NETWORKS

被引:0
|
作者
Garcia-Romero, Daniel [1 ]
Zhang, Xiaohui
McCree, Alan
Povey, Daniel
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
关键词
Unsupervised adaptation; speaker recognition; i-vectors; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional i-vector speaker recognition systems use a Gaussian mixture model (GMM) to collect sufficient statistics (SS). Recently, replacing this GMM with a deep neural network (DNN) has shown promising results. In this paper, we explore the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC). We show that collecting SS with a DNN trained on out-of-domain data boosts the speaker recognition performance of an out-of-domain system by more than 25%. Moreover, we integrate the DNN in an unsupervised adaptation framework, that uses agglomerative hierarchical clustering with a stopping criterion based on unsupervised calibration, and show that the initial gains of the out-of-domain system carry over to the final adapted system. Despite the fact that the DNN is trained on the out-of-domain data, the final adapted system produces a relative improvement of more than 30% with respect to the best published results on this task.
引用
收藏
页码:378 / 383
页数:6
相关论文
共 50 条
  • [1] Domain adaptation for ear recognition using deep convolutional neural networks
    Eyiokur, Fevziye Irem
    Yaman, Dogucan
    Ekenel, Hazim Kemal
    [J]. IET BIOMETRICS, 2018, 7 (03) : 199 - 206
  • [2] Domain and writer adaptation of offline Arabic handwriting recognition using deep neural networks
    Jemni, Sana Khamekhem
    Ammar, Sourour
    Kessentini, Yousri
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (03): : 2055 - 2071
  • [3] Domain and writer adaptation of offline Arabic handwriting recognition using deep neural networks
    Sana Khamekhem Jemni
    Sourour Ammar
    Yousri Kessentini
    [J]. Neural Computing and Applications, 2022, 34 : 2055 - 2071
  • [4] Insights into Deep Neural Networks for Speaker Recognition
    Garcia-Romero, Daniel
    McCree, Alan
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1141 - 1145
  • [5] Contrastive Adversarial Domain Adaptation Networks for Speaker Recognition
    Li, Longxin
    Mak, Man-Wai
    Chien, Jen-Tzung
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2236 - 2245
  • [6] SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS USING A HIERARCHY OF OUTPUT LAYERS
    Price, Ryan
    Iso, Ken-ichi
    Shinoda, Koichi
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 153 - 158
  • [7] Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks
    Tkachenko, Maxim
    Yamshinin, Alexander
    Lyubimov, Nikolay
    Kotov, Mikhail
    Nastasenko, Marina
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 690 - 699
  • [8] SPEAKER ADAPTATION OF CONTEXT DEPENDENT DEEP NEURAL NETWORKS
    Liao, Hank
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7947 - 7951
  • [9] Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks
    Lee, Wonkyum
    Hang, Kyu J.
    Lane, Ian
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3843 - 3847
  • [10] Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data
    Tian, Yao
    Cai, Meng
    He, Liang
    Zhang, Wei-Qiang
    Liu, Jia
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1863 - 1867