MULTI-LEVEL DEEP NEURAL NETWORK ADAPTATION FOR SPEAKER VERIFICATION USING MMD AND CONSISTENCY REGULARIZATION

被引:0
|
作者
Lin, Weiwei [1 ,2 ]
Mak, Man-Mai [1 ]
Li, Na [2 ]
Su, Dan [2 ]
Yu, Dong [2 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
关键词
Speaker verification; domain adaptation; data augmentation; maximum mean discrepancy; transfer learning;
D O I
10.1109/icassp40776.2020.9054134
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Adapting speaker verification (SV) systems to a new environment is a very challenging task. Current adaptation methods in SV mainly focus on the backend, i.e, adaptation is carried out after the speaker embeddings have been created. In this paper, we present a DNN-based adaptation method using maximum mean discrepancy (MMD). Our method exploits two important aspects neglected by previous research. First, instead of minimizing domain discrepancy at utterance-level alone, our method minimizes domain discrepancy at both frame-level and utterance-level, which we believe will make the adaptation more robust to the duration discrepancy between training data and test data. Second, we introduce a consistency regularization for unlabelled target-domain data. The consistency regularization encourages the target speaker embeddings robust to adverse perturbations. Experiments on NIST SRE 2016 and 2018 show that our DNN adaptation works significantly better than the previously proposed DNN adaptation methods. What's more, our method works well with backend adaptation. By combining the proposed method with backend adaptation, we achieve a 9% improvement over backend adaptation in SRE18.
引用
收藏
页码:6839 / 6843
页数:5
相关论文
共 50 条
  • [1] Enhancing Backdoor Attacks With Multi-Level MMD Regularization
    Xia, Pengfei
    Niu, Hongjing
    Li, Ziqiang
    Li, Bin
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (02) : 1675 - 1686
  • [2] Comparison of Regularization Constraints in Deep Neural Network based Speaker Adaptation
    Shen, Peng
    Lu, Xugang
    Kawai, Hisashi
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [3] DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Tang, Yun
    Ding, Guohong
    Huang, Jing
    He, Xiaodong
    Zhou, Bowen
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6116 - 6120
  • [4] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
    Li, Sheng
    Lu, Xugang
    Akita, Yuya
    Kawahara, Tatsuya
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
  • [5] Multi-level consistency regularization for domain adaptive object detection
    Tian, Kun
    Zhang, Chenghao
    Wang, Ying
    Xiang, Shiming
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (24): : 18003 - 18018
  • [6] Multi-level consistency regularization for domain adaptive object detection
    Kun Tian
    Chenghao Zhang
    Ying Wang
    Shiming Xiang
    [J]. Neural Computing and Applications, 2023, 35 : 18003 - 18018
  • [8] Spoofing-Aware Speaker Verification by Multi-Level Fusion
    Wu, Haibin
    Meng, Lingwei
    Kang, Jiawen
    Li, Jinchao
    Li, Xu
    Wu, Xixin
    Lee, Hung-yi
    Meng, Helen
    [J]. INTERSPEECH 2022, 2022, : 4357 - 4361
  • [9] Comparison of vector normalization methods in multi-level speaker verification
    Drgas, Szymon
    Dabrowski, Adam
    [J]. 2012 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS (ICSES), 2012,
  • [10] Dilated residual networks with multi-level attention for speaker verification
    Wu, Yanfeng
    Guo, Chenkai
    Gao, Hongcan
    Xu, Jing
    Bai, Guangdong
    [J]. NEUROCOMPUTING, 2020, 412 : 177 - 186