Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification

被引:7
|
作者
Wang, Zhenyu [1 ]
Xia, Wei [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst CRSS, Dallas, TX 75080 USA
来源
关键词
speaker verification; cross-domain adaptation; discrepancy loss; maximum mean discrepancy; forensics; distribution alignment; RECOGNITION;
D O I
10.21437/Interspeech.2020-2738
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. The lack of real naturalistic forensic audio corpora with ground-truth speaker identity represents a major challenge in this field. It is also difficult to directly employ small-scale domain-specific data to train complex neural network architectures due to domain mismatch and loss in performance. Alternatively, cross-domain speaker verification for multiple acoustic environments is a challenging task which could advance research in audio forensics. In this study, we introduce a CRSS-Forensics audio dataset collected in multiple acoustic environments. We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics. Based on this fine-tuned model, we align domain-specific distributions in the embedding space with the discrepancy loss and maximum mean discrepancy (MMD). This maintains effective performance on the clean set, while simultaneously generalizes the model to other acoustic domains. From the results, we demonstrate that diverse acoustic environments affect the speaker verification performance, and that our proposed approach of cross-domain adaptation can significantly improve the results in this scenario.
引用
收藏
页码:2257 / 2261
页数:5
相关论文
共 50 条
  • [1] Phoneme-Aware Adaptation with Discrepancy Minimization and Dynamically-Classified Vector for Text-independent Speaker Verification
    Wang, Jia
    Lan, Tianhao
    Chen, Jie
    Luo, Chengwen
    Wu, Chao
    Li, Jianqiang
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 6737 - 6745
  • [2] Collaborative and adversarial network for text-independent speaker verification in domain adaptation
    Qiang, Junhao
    Yang, Qun
    Gao, Jie
    Liu, Shaohan
    [J]. ELECTRONICS LETTERS, 2023, 59 (02)
  • [3] Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition
    Wang, Zhenyu
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 60 - 75
  • [4] CHANNEL ADAPTATION OF PLDA FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Chen, Liping
    Lee, Kong Aik
    Ma, Bin
    Guo, Wu
    Li, Haizhou
    Dai, Li Rong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5251 - 5255
  • [5] Supervised domain adaptation for text-independent speaker verification using limited data
    Sarfjoo, Seyyed Saeed
    Madikeri, Srikanth
    Motlicek, Petr
    Marcel, Sebastien
    [J]. INTERSPEECH 2020, 2020, : 3815 - 3819
  • [6] CROSS-LINGUAL TEXT-INDEPENDENT SPEAKER VERIFICATION USING UNSUPERVISED ADVERSARIAL DISCRIMINATIVE DOMAIN ADAPTATION
    Xia, Wei
    Huang, Jing
    Hansen, John H. L.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5816 - 5820
  • [7] Discriminative transformation for sufficient adaptation in text-independent speaker verification
    Yang, Hao
    Dong, Yuan
    Zhao, Xianyu
    Zha, Jian
    Wang, Haila
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 558 - +
  • [8] Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification
    Shum, Stephen
    Dehak, Najim
    Dehak, Reda
    Glass, James R.
    [J]. ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 76 - 82
  • [9] Cross-Domain Gradient Discrepancy Minimization for Unsupervised Domain Adaptation
    Du, Zhekai
    Li, Jingjing
    Su, Hongzu
    Zhu, Lei
    Lu, Ke
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3936 - 3945
  • [10] A tutorial on text-independent speaker verification
    Bimbot, F
    Bonastre, JF
    Fredouille, C
    Gravier, G
    Magrin-Chagnolleau, I
    Meignier, S
    Merlin, T
    Ortega-García, J
    Petrovska-Delacrétaz, D
    Reynolds, DA
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) : 430 - 451