MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION

被引:0
|
作者
Sell, Gregory [1 ]
McCree, Alan [1 ]
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
关键词
speaker diarization; speaker recognition; i-vectors;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
I-vector training and extraction assume that a speech file is spoken by a single speaker. This work considers the effects of violating that assumption with the presence of cross-talk or multi-speaker conversations. First, it is demonstrated that these problematic speech files can be detected using the i-vector representation itself. The impact of these violations of the single-speaker assumption are then explored along with strategies to mitigate it. It is shown that, even in predominantly clean data, the removal of cross-talk can provide modest gains, but that T matrix and PLDA training are largely robust to these types of noise. It is also shown that detection in front of diarization is a reasonable strategy in the presence of data with an unknown proportion of multi-speaker conversations. Finally, in the course of this work, evidence is found that cross-talk detection and multi-speaker detection may in fact be different tasks that require separately trained detectors.
引用
收藏
页码:5425 / 5429
页数:5
相关论文
共 50 条
  • [41] INVESTIGATION OF FAST AND EFFICIENT METHODS FOR MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION
    Zheng, Yibin
    Li, Xinhui
    Lu, Li
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6618 - 6622
  • [42] Mahalanobis Based Emission Model for Speaker Diarization of Telephone Conversations
    Furmanov, Tal
    Aminov, Lidiya
    Moyal, Ami
    Lapidot, Itshak
    [J]. 2014 IEEE 28TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL (IEEEI), 2014,
  • [43] Multiple feature combination to improve speaker diarization of telephone conversations
    Gupta, Vishwa
    Kenny, Patrick
    Ouellet, Pierre
    Boulianne, Gilles
    Dumouchel, Pierre
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 705 - 710
  • [44] Multi-Speaker Adaptation for Robust Speech Recognition under Ubiquitous Environment
    Shih, Po-Yi
    Wang, Jhing-Fa
    Lin, Yuan-Ning
    Fu, Zhong-Hua
    [J]. ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 126 - 131
  • [45] A Purely End-to-end System for Multi-speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
  • [46] Multi-Speaker Meeting Audio Segmentation
    Nwe, Tin Lay
    Dong, Minghui
    Khine, Swe Zin Kalayar
    Li, Haizhou
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2522 - 2525
  • [47] Investigation of Cross-show Speaker Diarization
    Yang, Qian
    Jin, Qin
    Schultz, Tanja
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2936 - +
  • [48] Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments
    P. Krishnamoorthy
    S. R. Mahadeva Prasanna
    [J]. Sadhana, 2009, 34 : 729 - 754
  • [49] Improving speaker diarization by cross EM refinement
    Ning, Huazhong
    Xu, Wei
    Gong, Yihong
    Huang, Thomas
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1901 - 1904
  • [50] Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments
    Krishnamoorthy, P.
    Prasanna, S. R. Mahadeva
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2009, 34 (05): : 729 - 754