Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech

被引：0

作者：

Das, Rohan Kumar ^{[1
]}

Yang, Jichen ^{[1
]}

Li, Haizhou ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

来源：

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年

关键词：

DIARIZATION; RECOGNITION; SYSTEM;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Speaker verification in a multi-speaker environment is an emerging research topic. Speaker clustering, that separates multiple speakers, can be effective if a predetermined threshold or the number of speakers present in a multi-speaker utterance is given. However, the problem in practice does not provide the leverage for either of the factors. This work proposes to handle such a problem by introducing a penalty distance factor in the pipeline of traditional clustering techniques. The proposed framework first uses traditional clustering techniques to form speaker clusters for a given number of speakers. We then compute the penalty distance based on Bayesian information criterion that is used for merging alike clusters in a multi-speaker utterance. The studies are conducted on speakers in the wild (SITW) and recent NIST SRE 2018 databases that contain multi-speaker conversational speech in noisy environments. The results show the effectiveness of the proposed penalty distance based refinement in such a scenario.

引用

页码：1630 / 1635

页数：6

共 50 条

[21] Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
Luong, Hieu-Thi
Wang, Xin
Yamagishi, Junichi
Nishizawa, Nobuyuki
[J]. INTERSPEECH 2019, 2019, : 1303 - 1307
[22] Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis
Fujita, Kenichi
Ando, Atsushi
Ijima, Yusuke
[J]. INTERSPEECH 2021, 2021, : 3141 - 3145
[23] Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment
Liu, Zhaoyu
Mak, Brian
[J]. INTERSPEECH 2020, 2020, : 2932 - 2936
[24] INVESTIGATING ON INCORPORATING PRETRAINED AND LEARNABLE SPEAKER REPRESENTATIONS FOR MULTI-SPEAKER MULTI-STYLE TEXT-TO-SPEECH
Chien, Chung-Ming
Lin, Jheng-Hao
Huang, Chien-yu
Hsu, Po-chun
Lee, Hung-yi
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8588 - 8592
[25] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
Chen, Mengnan
Chen, Minchuan
Liang, Shuang
Ma, Jun
Chen, Lei
Wang, Shaojun
Xiao, Jing
[J]. INTERSPEECH 2019, 2019, : 2105 - 2109
[26] An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis
Lorincz, Beata
Stan, Adriana
Giurgiu, Mircea
[J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 756 - 765
[27] MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION
Sell, Gregory
McCree, Alan
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5425 - 5429
[28] Unsupervised Speaker and Expression Factorization for Multi-Speaker Expressive Synthesis of Ebooks
Chen, Langzhou
Braunschweiler, Norbert
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1041 - 1045
[29] SPEAKER RECOGNITION FOR MULTI-SPEAKER CONVERSATIONS USING X-VECTORS
Snyder, David
Garcia-Romero, Daniel
Sell, Gregory
McCree, Alan
Povey, Daniel
Khudanpur, Sanjeev
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5796 - 5800
[30] INVESTIGATION OF FAST AND EFFICIENT METHODS FOR MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION
Zheng, Yibin
Li, Xinhui
Lu, Li
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6618 - 6622

← 1 2 3 4 5 →