An Investigation into Back-end Advancements for Speaker Recognition in Multi-Session and Noisy Enrollment Scenarios

被引:30
|
作者
Liu, Gang [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Ctr Robust Speech Syst, Richardson, TX 75252 USA
基金
美国国家科学基金会;
关键词
Classification algorithms; GCDS; PLDA; speaker recognition; universal background support; SUPPORT VECTOR MACHINES; VERIFICATION; IDENTIFICATION; VARIABILITY; KERNEL; SPEECH; SYSTEM;
D O I
10.1109/TASLP.2014.2352154
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This study aims to explore the case of robust speaker recognition with multi-session enrollments and noise, with an emphasis on optimal organization and utilization of speaker information presented in the enrollment and development data. This study has two core objectives. First, we investigate more robust back-ends to address noisy multi-session enrollment data for speaker recognition. This task is achieved by proposing novel back-end algorithms. Second, we construct a highly discriminative speaker verification framework. This task is achieved through intrinsic and extrinsic back-end algorithm modification, resulting in complementary sub-systems. Evaluation of the proposed framework is performed on the NIST SRE2012 corpus. Results not only confirm individual sub-system advancements over an established baseline, the final grand fusion solution also represents a comprehensive overall advancement for the NIST SRE2012 core tasks. Compared with state-of-the-art SID systems on the NIST SRE2012, the novel parts of this study are: 1) exploring a more diverse set of solutions for low-dimensional i-Vector based modeling; and 2) diversifying the information configuration before modeling. All these two parts work together, resulting in very competitive performance with reasonable computational cost.
引用
收藏
页码:1978 / 1992
页数:15
相关论文
共 10 条
  • [1] AN INVESTIGATION ON BACK-END FOR SPEAKER RECOGNITION IN MULTI-SESSION ENROLLMENT
    Liu, Gang
    Hasan, Taufiq
    Boril, Hynek
    Hansen, John H. L.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7755 - 7759
  • [2] AN INVESTIGATION OF SUMMED-CHANNEL SPEAKER RECOGNITION WITH MULTI-SESSION ENROLLMENT
    Zhang, Shanshan
    Zhang, Ce
    Zheng, Rong
    Xu, Bo
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] ATTENTION BACK-END FOR AUTOMATIC SPEAKER VERIFICATION WITH MULTIPLE ENROLLMENT UTTERANCES
    Zeng, Chang
    Wang, Xin
    Cooper, Erica
    Miao, Xiaoxiao
    Yamagishi, Junichi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6717 - 6721
  • [4] Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances
    Zeng, Chang
    Miao, Xiaoxiao
    Wang, Xin
    Cooper, Erica
    Yamagishi, Junichi
    COMPUTER SPEECH AND LANGUAGE, 2024, 86
  • [5] Generalized Domain Adaptation Framework for Parametric Back-End in Speaker Recognition
    Wang, Qiongqiong
    Okabe, Koji
    Lee, Kong Aik
    Koshinaka, Takafumi
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 3936 - 3947
  • [6] Global impostor selection for DBNs in multi-session i-vector speaker recognition
    Ghahabi, Omid
    Hernando, Javier
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8854 : 89 - 98
  • [7] Global Impostor Selection for DBNs in Multi-session i-Vector Speaker Recognition
    Ghahabi, Omid
    Hernando, Javier
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 89 - 98
  • [8] Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment
    Cai, Danwei
    Qin, Xiaoyi
    Li, Ming
    INTERSPEECH 2019, 2019, : 4365 - 4369
  • [9] ADVERSARIAL MULTI-TASK DEEP FEATURES AND UNSUPERVISED BACK-END ADAPTATION FOR LANGUAGE RECOGNITION
    Peng, Zhiyuan
    Feng, Siyuan
    Lee, Tan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5961 - 5965
  • [10] Non-contemporary robustness in text-dependent speaker-recognition using multi-session templates in an one-pass dynamic-programming framework
    Ramasubramanian, V.
    Kumar, V. Praveen
    Thiyagarajan, S.
    Proceedings of the Sixth International Conference on Advances in Pattern Recognition, 2007, : 391 - 395