Exploring single channel speech separation for short-time text-dependent speaker verification

被引：1

作者：

Han, Jiangyu ^{[1
]}

Shi, Yan ^{[1
]}

Long, Yanhua ^{[1
]}

Liang, Jiaen ^{[2
]}

机构：

[1] Shanghai Normal Univ, Key Innovat Grp Digital Humanities Resource & Res, Shanghai 200234, Peoples R China

[2] Unisound AI Technol Co Ltd, Beijing, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2022年 / 25卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Speaker verification; Text-dependent; Test speech extraction; Conv-TasNet;

D O I：

10.1007/s10772-022-09959-8

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The automatic speaker verification (ASV) has recently achieved great progress. However, the performance of ASV degrades significantly when the test speech is corrupted by interference speakers, especially when multi-talkers speak at the same time. Although the target speech extraction (TSE) has also attracted increasing attention in recent years, its TSE ability is constrained by the required pre-saved anchor speech examples of the target speaker. It becomes impossible to directly use existing TSE methods to extract the desired test speech in an ASV test trial, because the speaker identity of each test speech is unknown. Therefore, based on the state-of-the-art single channel speech separation technique-Conv-TasNet, this paper aims to design a test speech extraction mechanism for building short-time text-dependent speaker verification systems. Instead of providing a pre-saved anchor speech for each training or test speaker, we extract the desired test speech from a mixture by computing the pairwise dynamic time warping between each output of Conv-TasNet and the enrollment utterance of speaker model in each test trial in the ASV task. The acoustic domain mismatch between ASV and TSE training data, the behaviors of speech separation in different stages of ASV system building, such as, the voiceprint enrollment, test and PLDA backend are all investigated in detail. Experimental results show that the proposed test speech extraction mechanism in ASV brings significant relative improvements (36.3%) in overlapped multi-talker speaker verification, benefits can be found not only in ASV test stage, but also in target speaker modeling.

引用

页码：261 / 268

页数：8

共 50 条

[41] BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020
Lozano-Diez, Alicia
Silnova, Anna
Pulugundla, Bhargav
Rohdin, Johan
Vesely, Karel
Burget, Lukas
Plchot, Oldrich
Glembek, Ondrej
Novotny, Ondvrej
Matejka, Pavel
[J]. INTERSPEECH 2020, 2020, : 761 - 765
[42] Tandem Features for Text-dependent Speaker Verification on the RedDots Corpus
Alam, Md Jahangir
Kenny, Patrick
Gupta, Vishwa
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 420 - 424
[43] ON INSTANTANEOUS AND TRANSITIONAL SPECTRAL INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
BERNASCONI, C
[J]. SPEECH COMMUNICATION, 1990, 9 (02) : 129 - 139
[44] Sub-band based text-dependent speaker verification
Sivakumaran, P
Ariyaeeinia, AM
Loomes, MJ
[J]. SPEECH COMMUNICATION, 2003, 41 (2-3) : 485 - 509
[45] Unsupervised Learning of HMM Topology for Text-dependent Speaker Verification
Liu, Ming
Huang, Thomas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 921 - 924
[46] Multi-Task Learning for Text-dependent Speaker Verification
Chen, Nanxin
Qian, Yanmin
Yu, Kai
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 185 - 189
[47] EFFECTS OF GENDER INFORMATION IN TEXT-INDEPENDENT AND TEXT-DEPENDENT SPEAKER VERIFICATION
Kanervisto, Anssi
Vestman, Ville
Sahidullah, Md
Hautamaki, Ville
Kinnunen, Tomi
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5360 - 5364
[48] Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification
Saeta, JR
Hernando, J
[J]. NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, 3817 : 81 - 91
[49] PHONETICALLY-CONSTRAINED PLDA MODELING FOR TEXT-DEPENDENT SPEAKER VERIFICATION WITH MULTIPLE SHORT UTTERANCES
Larcher, Anthony
Lee, Kong Aik
Ma, Bin
Li, Haizhou
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7673 - 7677
[50] Improving X-vector and PLDA for Text-dependent Speaker Verification
Chen, Zhuxin
Lin, Yue
[J]. INTERSPEECH 2020, 2020, : 726 - 730

← 1 2 3 4 5 →