Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech

被引:0
|
作者
Das, Rohan Kumar [1 ]
Yang, Jichen [1 ]
Li, Haizhou [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
关键词
DIARIZATION; RECOGNITION; SYSTEM;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speaker verification in a multi-speaker environment is an emerging research topic. Speaker clustering, that separates multiple speakers, can be effective if a predetermined threshold or the number of speakers present in a multi-speaker utterance is given. However, the problem in practice does not provide the leverage for either of the factors. This work proposes to handle such a problem by introducing a penalty distance factor in the pipeline of traditional clustering techniques. The proposed framework first uses traditional clustering techniques to form speaker clusters for a given number of speakers. We then compute the penalty distance based on Bayesian information criterion that is used for merging alike clusters in a multi-speaker utterance. The studies are conducted on speakers in the wild (SITW) and recent NIST SRE 2018 databases that contain multi-speaker conversational speech in noisy environments. The results show the effectiveness of the proposed penalty distance based refinement in such a scenario.
引用
收藏
页码:1630 / 1635
页数:6
相关论文
共 50 条
  • [41] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    [J]. DIGITAL SIGNAL PROCESSING, 2024, 145
  • [42] Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
    Jeon, Yejin
    Kim, Yunsu
    Lee, Gary Geunbae
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18336 - 18344
  • [43] Speaker detection using multi-speaker audio files for both enrollment and test
    Bonastre, JF
    Meignier, S
    Merlin, T
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 77 - 80
  • [44] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
    Cooper, Erica
    Lai, Cheng-, I
    Yasuda, Yusuke
    Yamagishi, Junichi
    [J]. INTERSPEECH 2020, 2020, : 3979 - 3983
  • [45] Multi-array multi-speaker tracking
    Potamitis, I
    Tremoulis, G
    Fakotakis, N
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 206 - 213
  • [46] Classification of Multi Speaker Shouted Speech and Single Speaker Normal Speech
    Baghel, Shikha
    Prasanna, S. R. Mahadeva
    Guha, Prithwijit
    [J]. TENCON 2017 - 2017 IEEE REGION 10 CONFERENCE, 2017, : 2388 - 2392
  • [47] Multi-Speaker Meeting Audio Segmentation
    Nwe, Tin Lay
    Dong, Minghui
    Khine, Swe Zin Kalayar
    Li, Haizhou
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2522 - 2525
  • [48] JOINTLY RECOGNIZING MULTI-SPEAKER CONVERSATIONS
    Ji, Gang
    Bilmes, Jeff
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5110 - 5113
  • [49] Neural Speech Tracking Highlights the Importance of Visual Speech in Multi-speaker Situations
    Haider, Chandra L.
    Park, Hyojin
    Hauswald, Anne
    Weisz, Nathan
    [J]. JOURNAL OF COGNITIVE NEUROSCIENCE, 2024, 36 (01) : 128 - 142
  • [50] Target Speaker Extraction for Multi-Talker Speaker Verification
    Rao, Wei
    Xu, Chenglin
    Chng, Eng Siong
    Li, Haizhou
    [J]. INTERSPEECH 2019, 2019, : 1273 - 1277