Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-Supervised Speaker Verification

被引:2
|
作者
Mun, Sung Hwan [1 ,2 ]
Han, Min Hyun [1 ,2 ]
Lee, Dongjune [1 ,2 ]
Kim, Jihwan [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, INMC, Seoul 08826, South Korea
关键词
Training; Probabilistic logic; Uncertainty; Representation learning; Task analysis; Maximum likelihood estimation; Entropy; Speaker verification; self-supervised learning; bootstrap representation learning; probabilistic speaker embedding;
D O I
10.1109/ACCESS.2021.3137190
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end. In the front-end stage, we learn the speaker representations via the bootstrap training scheme with the uniformity regularization term. In the back-end stage, the probabilistic speaker embeddings are estimated by maximizing the mutual likelihood score between the speech samples belonging to the same speaker, which provide not only speaker representations but also data uncertainty. Experimental results show that the proposed bootstrap equilibrium training strategy can effectively help learn the speaker representations and outperforms the conventional methods based on contrastive learning. Also, we demonstrate that the integrated two-stage framework further improves the speaker verification performance on the VoxCeleb1 test set in terms of EER and MinDCF.
引用
收藏
页码:167615 / 167627
页数:13
相关论文
共 50 条
  • [1] ROBUST SPEAKER VERIFICATION WITH JOINT SELF-SUPERVISED AND SUPERVISED LEARNING
    Wang, Kai
    Zhang, Xiaolei
    Zhang, Miao
    Li, Yuguang
    Lee, Jaeyun
    Cho, Kiho
    Park, Sung-UN
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7637 - 7641
  • [2] LARGE-SCALE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEAKER VERIFICATION
    Chen, Zhengyang
    Chen, Sanyuan
    Wu, Yu
    Qian, Yao
    Wang, Chengyi
    Liu, Shujie
    Qian, Yanmin
    Zeng, Michael
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6147 - 6151
  • [3] Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning
    Wu, Haibin
    Li, Xu
    Liu, Andy T.
    Wu, Zhiyong
    Meng, Helen
    Lee, Hung-Yi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 202 - 217
  • [4] Augmentation Adversarial Training for Self-Supervised Speaker Representation Learning
    Kang, Jingu
    Huh, Jaesung
    Heo, Hee Soo
    Chung, Joon Son
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1253 - 1262
  • [5] AN ITERATIVE FRAMEWORK FOR SELF-SUPERVISED DEEP SPEAKER REPRESENTATION LEARNING
    Cai, Danwei
    Wang, Weiqing
    Li, Ming
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6728 - 6732
  • [6] A COMPREHENSIVE STUDY ON SELF-SUPERVISED DISTILLATION FOR SPEAKER REPRESENTATION LEARNING
    Chen, Zhengyang
    Qian, Yao
    Han, Bing
    Qian, Yanmin
    Zeng, Michael
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 599 - 604
  • [7] Prototype Division for Self-Supervised Speaker Verification
    Zhao, Zhenduo
    Li, Zhuo
    Zhang, Xueshuai
    Wang, Wenchao
    Zhang, Pengyuan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 880 - 884
  • [8] CONTRASTIVE SELF-SUPERVISED LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Zhang, Haoran
    Zou, Yuexian
    Wang, Helin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6713 - 6717
  • [9] SELF-SUPERVISED LEARNING BASED DOMAIN ADAPTATION FOR ROBUST SPEAKER VERIFICATION
    Chen, Zhengyang
    Wang, Shuai
    Qian, Yanmin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5834 - 5838
  • [10] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
    Sang, Mufan
    Li, Haoqi
    Liu, Fang
    Arnold, Andrew O.
    Wan, Li
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131