Exploring single channel speech separation for short-time text-dependent speaker verification

被引：1

作者：

Han, Jiangyu ^{[1
]}

Shi, Yan ^{[1
]}

Long, Yanhua ^{[1
]}

Liang, Jiaen ^{[2
]}

机构：

[1] Shanghai Normal Univ, Key Innovat Grp Digital Humanities Resource & Res, Shanghai 200234, Peoples R China

[2] Unisound AI Technol Co Ltd, Beijing, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2022年 / 25卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Speaker verification; Text-dependent; Test speech extraction; Conv-TasNet;

D O I：

10.1007/s10772-022-09959-8

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The automatic speaker verification (ASV) has recently achieved great progress. However, the performance of ASV degrades significantly when the test speech is corrupted by interference speakers, especially when multi-talkers speak at the same time. Although the target speech extraction (TSE) has also attracted increasing attention in recent years, its TSE ability is constrained by the required pre-saved anchor speech examples of the target speaker. It becomes impossible to directly use existing TSE methods to extract the desired test speech in an ASV test trial, because the speaker identity of each test speech is unknown. Therefore, based on the state-of-the-art single channel speech separation technique-Conv-TasNet, this paper aims to design a test speech extraction mechanism for building short-time text-dependent speaker verification systems. Instead of providing a pre-saved anchor speech for each training or test speaker, we extract the desired test speech from a mixture by computing the pairwise dynamic time warping between each output of Conv-TasNet and the enrollment utterance of speaker model in each test trial in the ASV task. The acoustic domain mismatch between ASV and TSE training data, the behaviors of speech separation in different stages of ASV system building, such as, the voiceprint enrollment, test and PLDA backend are all investigated in detail. Experimental results show that the proposed test speech extraction mechanism in ASV brings significant relative improvements (36.3%) in overlapped multi-talker speaker verification, benefits can be found not only in ASV test stage, but also in target speaker modeling.

引用

页码：261 / 268

页数：8

共 50 条

[1] Exploring single channel speech separation for short-time text-dependent speaker verification
Jiangyu Han
Yan Shi
Yanhua Long
Jiaen Liang
[J]. International Journal of Speech Technology, 2022, 25 : 261 - 268
[2] Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals
Avinash, B.
Guruprasad, S.
Yegnanarayana, B.
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1073 - +
[3] Addressing Text-Dependent Speaker Verification Using Singing Speech
Shi, Yan
Zhou, Juanjuan
Long, Yanhua
Li, Yijie
Mao, Hongwei
[J]. APPLIED SCIENCES-BASEL, 2019, 9 (13):
[4] EXPLORING SEQUENTIAL CHARACTERISTICS IN SPEAKER BOTTLENECK FEATURE FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Chen, Liping
Zhao, Yong
Zhang, Shi-Xiong
Li, Jie
Ye, Guoli
Soong, Frank
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5364 - 5368
[5] Speaker-dependent Dictionary-based Speech Enhancement for Text-Dependent Speaker Verification
Thomsen, Nicolai Baek
Thomsen, Dennis Alexander Lehmann
Tan, Zheng-Hua
Lindberg, Borge
Jensen, Soren Holdt
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1839 - 1843
[6] Text-Dependent Speaker Verification System: A Review
Debnath, Saswati
Soni, B.
Baruah, U.
Sah, D. K.
[J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
[7] Deep feature for text-dependent speaker verification
Liu, Yuan
Qian, Yanmin
Chen, Nanxin
Fu, Tianfan
Zhang, Ya
Yu, Kai
[J]. SPEECH COMMUNICATION, 2015, 73 : 1 - 13
[8] Bidirectional Attention for Text-Dependent Speaker Verification
Fang, Xin
Gao, Tian
Zou, Liang
Ling, Zhenhua
[J]. SENSORS, 2020, 20 (23) : 1 - 17
[9] Robust Methods for Text-Dependent Speaker Verification
Bhukya, Ramesh K.
Prasanna, S. R. Mahadeva
Sarma, Biswajit Dev
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (11) : 5253 - 5288
[10] Content Normalization for Text-dependent Speaker Verification
Dey, Subhadeep
Madikeri, Srikanth
Motlicek, Petr
Ferras, Marc
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1482 - 1486

← 1 2 3 4 5 →