MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION

被引:0
|
作者
Sell, Gregory [1 ]
McCree, Alan [1 ]
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
关键词
speaker diarization; speaker recognition; i-vectors;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
I-vector training and extraction assume that a speech file is spoken by a single speaker. This work considers the effects of violating that assumption with the presence of cross-talk or multi-speaker conversations. First, it is demonstrated that these problematic speech files can be detected using the i-vector representation itself. The impact of these violations of the single-speaker assumption are then explored along with strategies to mitigate it. It is shown that, even in predominantly clean data, the removal of cross-talk can provide modest gains, but that T matrix and PLDA training are largely robust to these types of noise. It is also shown that detection in front of diarization is a reasonable strategy in the presence of data with an unknown proportion of multi-speaker conversations. Finally, in the course of this work, evidence is found that cross-talk detection and multi-speaker detection may in fact be different tasks that require separately trained detectors.
引用
收藏
页码:5425 / 5429
页数:5
相关论文
共 50 条
  • [1] Speech Recognition and Multi-Speaker Diarization of Long Conversations
    Mao, Huanru Henry
    Li, Shuyang
    McAuley, Julian
    Cottrell, Garrison W.
    [J]. INTERSPEECH 2020, 2020, : 691 - 695
  • [2] SPEAKER RECOGNITION FOR MULTI-SPEAKER CONVERSATIONS USING X-VECTORS
    Snyder, David
    Garcia-Romero, Daniel
    Sell, Gregory
    McCree, Alan
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5796 - 5800
  • [3] JOINTLY RECOGNIZING MULTI-SPEAKER CONVERSATIONS
    Ji, Gang
    Bilmes, Jeff
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5110 - 5113
  • [4] A hybrid approach to speaker recognition in multi-speaker environment
    Trivedi, J
    Maitra, A
    Mitra, SK
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 272 - 275
  • [5] Speaker Diarization in a Multi-Speaker Environment Using Particle Swarm Optimization and Mutual Information
    Mirrezaie, S. M.
    Ahadi, S. M.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1533 - 1536
  • [6] Robust speaker diarization in a multi-speaker environment using autocorrelation-based noise subtraction
    Mirrezaie, S. M.
    Ahadi, S. M.
    Kashi, A.
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1-3, 2007, : 962 - 967
  • [7] INTEGRATION OF SPEECH SEPARATION, DIARIZATION, AND RECOGNITION FOR MULTI-SPEAKER MEETINGS: SYSTEM DESCRIPTION, COMPARISON, AND ANALYSIS
    Raj, Desh
    Denisov, Pavel
    Chen, Zhuo
    Erdogan, Hakan
    Huang, Zili
    He, Maokui
    Watanabe, Shinji
    Du, Jun
    Yoshioka, Takuya
    Luo, Yi
    Kanda, Naoyuki
    Li, Jinyu
    Wisdom, Scott
    Hershey, John R.
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 897 - 904
  • [8] Fast ICA for Multi-speaker Recognition System
    Zhou, Yan
    Zhao, Zhiqiang
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, 2010, 93 : 507 - 513
  • [9] Multi-speaker Recognition in Cocktail Party Problem
    Wang, Yiqian
    Sun, Wensheng
    [J]. COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2019, 463 : 2116 - 2123
  • [10] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    [J]. DIGITAL SIGNAL PROCESSING, 2024, 145