Exploring single channel speech separation for short-time text-dependent speaker verification

被引:1
|
作者
Han, Jiangyu [1 ]
Shi, Yan [1 ]
Long, Yanhua [1 ]
Liang, Jiaen [2 ]
机构
[1] Shanghai Normal Univ, Key Innovat Grp Digital Humanities Resource & Res, Shanghai 200234, Peoples R China
[2] Unisound AI Technol Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Speaker verification; Text-dependent; Test speech extraction; Conv-TasNet;
D O I
10.1007/s10772-022-09959-8
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The automatic speaker verification (ASV) has recently achieved great progress. However, the performance of ASV degrades significantly when the test speech is corrupted by interference speakers, especially when multi-talkers speak at the same time. Although the target speech extraction (TSE) has also attracted increasing attention in recent years, its TSE ability is constrained by the required pre-saved anchor speech examples of the target speaker. It becomes impossible to directly use existing TSE methods to extract the desired test speech in an ASV test trial, because the speaker identity of each test speech is unknown. Therefore, based on the state-of-the-art single channel speech separation technique-Conv-TasNet, this paper aims to design a test speech extraction mechanism for building short-time text-dependent speaker verification systems. Instead of providing a pre-saved anchor speech for each training or test speaker, we extract the desired test speech from a mixture by computing the pairwise dynamic time warping between each output of Conv-TasNet and the enrollment utterance of speaker model in each test trial in the ASV task. The acoustic domain mismatch between ASV and TSE training data, the behaviors of speech separation in different stages of ASV system building, such as, the voiceprint enrollment, test and PLDA backend are all investigated in detail. Experimental results show that the proposed test speech extraction mechanism in ASV brings significant relative improvements (36.3%) in overlapped multi-talker speaker verification, benefits can be found not only in ASV test stage, but also in target speaker modeling.
引用
收藏
页码:261 / 268
页数:8
相关论文
共 50 条
  • [41] BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020
    Lozano-Diez, Alicia
    Silnova, Anna
    Pulugundla, Bhargav
    Rohdin, Johan
    Vesely, Karel
    Burget, Lukas
    Plchot, Oldrich
    Glembek, Ondrej
    Novotny, Ondvrej
    Matejka, Pavel
    [J]. INTERSPEECH 2020, 2020, : 761 - 765
  • [42] Tandem Features for Text-dependent Speaker Verification on the RedDots Corpus
    Alam, Md Jahangir
    Kenny, Patrick
    Gupta, Vishwa
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 420 - 424
  • [43] ON INSTANTANEOUS AND TRANSITIONAL SPECTRAL INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    BERNASCONI, C
    [J]. SPEECH COMMUNICATION, 1990, 9 (02) : 129 - 139
  • [44] Sub-band based text-dependent speaker verification
    Sivakumaran, P
    Ariyaeeinia, AM
    Loomes, MJ
    [J]. SPEECH COMMUNICATION, 2003, 41 (2-3) : 485 - 509
  • [45] Unsupervised Learning of HMM Topology for Text-dependent Speaker Verification
    Liu, Ming
    Huang, Thomas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 921 - 924
  • [46] Multi-Task Learning for Text-dependent Speaker Verification
    Chen, Nanxin
    Qian, Yanmin
    Yu, Kai
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 185 - 189
  • [47] EFFECTS OF GENDER INFORMATION IN TEXT-INDEPENDENT AND TEXT-DEPENDENT SPEAKER VERIFICATION
    Kanervisto, Anssi
    Vestman, Ville
    Sahidullah, Md
    Hautamaki, Ville
    Kinnunen, Tomi
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5360 - 5364
  • [48] Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification
    Saeta, JR
    Hernando, J
    [J]. NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, 3817 : 81 - 91
  • [49] PHONETICALLY-CONSTRAINED PLDA MODELING FOR TEXT-DEPENDENT SPEAKER VERIFICATION WITH MULTIPLE SHORT UTTERANCES
    Larcher, Anthony
    Lee, Kong Aik
    Ma, Bin
    Li, Haizhou
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7673 - 7677
  • [50] Improving X-vector and PLDA for Text-dependent Speaker Verification
    Chen, Zhuxin
    Lin, Yue
    [J]. INTERSPEECH 2020, 2020, : 726 - 730