Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification

被引:8
|
作者
You, Lanhua [1 ]
Guo, Wu [1 ]
Dai, Li-Rong [1 ]
Du, Jun [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speaker verification; High-order statistics; X-vector; Multi-task learning; Unsupervised learning;
D O I
10.21437/Interspeech.2019-2264
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification. This paper presents a multi-task learning architecture for training the speaker embedding DNN with the primary task of classifying the target speakers, and the auxiliary task of reconstructing the first- and higher-order statistics of the original input utterance. The proposed training strategy aggregates both the supervised and unsupervised learning into one framework to make the speaker embeddings more discriminative and robust. Experiments are carried out using the NIST SRE16 evaluation dataset and the VOiCES dataset. The results demonstrate that our proposed method outperforms the original x-vector approach with very low additional complexity added.
引用
收藏
页码:1158 / 1162
页数:5
相关论文
共 50 条
  • [41] A ROBUST TEXT-INDEPENDENT SPEAKER VERIFICATION METHOD BASED ON SPEECH SEPARATION AND DEEP SPEAKER
    Zhao, Fei
    Li, Hao
    Zhang, Xueliang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6101 - 6105
  • [42] Lambda-vector modeling temporal and channel interactions for text-independent speaker verification
    Guangcun Wei
    Hang Min
    Yunfei Xu
    Yanna Zhang
    [J]. Scientific Reports, 12
  • [43] Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification
    Peng, Junyi
    Gu, Rongzhi
    Zou, Yuexian
    [J]. INTERSPEECH 2020, 2020, : 3246 - 3250
  • [44] FEATURE SELECTION USING ADAPTIVE LEARNING NETWORKS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    CHEUNG, RS
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S183 - S183
  • [45] Lambda-vector modeling temporal and channel interactions for text-independent speaker verification
    Wei, Guangcun
    Min, Hang
    Xu, Yunfei
    Zhang, Yanna
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [46] MULTI-TASK LEARNING FOR SPEAKER VERIFICATION AND VOICE TRIGGER DETECTION
    Sigtia, Siddharth
    Marchi, Erik
    Kajarekar, Sachin
    Naik, Devang
    Bridle, John
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6844 - 6848
  • [47] PARTIAL AUC OPTIMIZATION BASED DEEP SPEAKER EMBEDDINGS WITH CLASS-CENTER LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Bai, Zhongxin
    Zhang, Xiao-Lei
    Chen, Jingdong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6819 - 6823
  • [48] Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
    Zhang, Chunlei
    Koishida, Kazuhito
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1633 - 1644
  • [49] The Catcher in the Field: A Fieldprint based Spoofing Detection for Text-Independent Speaker Verification
    Yan, Chen
    Long, Yan
    Ji, Xiaoyu
    Xu, Wenyuan
    [J]. PROCEEDINGS OF THE 2019 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'19), 2019, : 1215 - 1229
  • [50] Exploiting High-Order Information in Heterogeneous Multi-Task Feature Learning
    Luo, Yong
    Tao, Dacheng
    Wen, Yonggang
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2443 - 2449