Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech

被引：0

作者：

Das, Rohan Kumar ^{[1
]}

Yang, Jichen ^{[1
]}

Li, Haizhou ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

来源：

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年

关键词：

DIARIZATION; RECOGNITION; SYSTEM;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Speaker verification in a multi-speaker environment is an emerging research topic. Speaker clustering, that separates multiple speakers, can be effective if a predetermined threshold or the number of speakers present in a multi-speaker utterance is given. However, the problem in practice does not provide the leverage for either of the factors. This work proposes to handle such a problem by introducing a penalty distance factor in the pipeline of traditional clustering techniques. The proposed framework first uses traditional clustering techniques to form speaker clusters for a given number of speakers. We then compute the penalty distance based on Bayesian information criterion that is used for merging alike clusters in a multi-speaker utterance. The studies are conducted on speakers in the wild (SITW) and recent NIST SRE 2018 databases that contain multi-speaker conversational speech in noisy environments. The results show the effectiveness of the proposed penalty distance based refinement in such a scenario.

引用

页码：1630 / 1635

页数：6

共 50 条

[41] Speaker clustering and transformation for speaker adaptation in speech recognition systems
Padmanabhan, M
Bahl, LR
Nahamoo, D
Picheny, MA
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 71 - 77
[42] Sparse Component Analysis for Speech Recognition in Multi-Speaker Environment
Asaei, Afsaneh
Bourlard, Herve
Garner, Philip N.
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1704 - 1707
[43] ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH WITH STATE-OF-THE-ART NEURAL SPEAKER EMBEDDINGS
Cooper, Erica
Lai, Cheng-, I
Yasuda, Yusuke
Fang, Fuming
Wang, Xin
Chen, Nanxin
Yamagishi, Junichi
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6184 - 6188
[44] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
[45] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
[J]. DIGITAL SIGNAL PROCESSING, 2024, 145
[46] Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Jeon, Yejin
Kim, Yunsu
Lee, Gary Geunbae
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18336 - 18344
[47] Speaker detection using multi-speaker audio files for both enrollment and test
Bonastre, JF
Meignier, S
Merlin, T
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 77 - 80
[48] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
Cooper, Erica
Lai, Cheng-, I
Yasuda, Yusuke
Yamagishi, Junichi
[J]. INTERSPEECH 2020, 2020, : 3979 - 3983
[49] Classification of Multi Speaker Shouted Speech and Single Speaker Normal Speech
Baghel, Shikha
Prasanna, S. R. Mahadeva
Guha, Prithwijit
[J]. TENCON 2017 - 2017 IEEE REGION 10 CONFERENCE, 2017, : 2388 - 2392
[50] Multi-array multi-speaker tracking
Potamitis, I
Tremoulis, G
Fakotakis, N
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 206 - 213

← 1 2 3 4 5 →