MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION

被引：0

作者：

Sell, Gregory ^{[1
]}

McCree, Alan ^{[1
]}

机构：

[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

speaker diarization; speaker recognition; i-vectors;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

I-vector training and extraction assume that a speech file is spoken by a single speaker. This work considers the effects of violating that assumption with the presence of cross-talk or multi-speaker conversations. First, it is demonstrated that these problematic speech files can be detected using the i-vector representation itself. The impact of these violations of the single-speaker assumption are then explored along with strategies to mitigate it. It is shown that, even in predominantly clean data, the removal of cross-talk can provide modest gains, but that T matrix and PLDA training are largely robust to these types of noise. It is also shown that detection in front of diarization is a reasonable strategy in the presence of data with an unknown proportion of multi-speaker conversations. Finally, in the course of this work, evidence is found that cross-talk detection and multi-speaker detection may in fact be different tasks that require separately trained detectors.

引用

页码：5425 / 5429

页数：5

共 50 条

[1] Speech Recognition and Multi-Speaker Diarization of Long Conversations
Mao, Huanru Henry
Li, Shuyang
McAuley, Julian
Cottrell, Garrison W.
[J]. INTERSPEECH 2020, 2020, : 691 - 695
[2] SPEAKER RECOGNITION FOR MULTI-SPEAKER CONVERSATIONS USING X-VECTORS
Snyder, David
Garcia-Romero, Daniel
Sell, Gregory
McCree, Alan
Povey, Daniel
Khudanpur, Sanjeev
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5796 - 5800
[3] JOINTLY RECOGNIZING MULTI-SPEAKER CONVERSATIONS
Ji, Gang
Bilmes, Jeff
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5110 - 5113
[4] A hybrid approach to speaker recognition in multi-speaker environment
Trivedi, J
Maitra, A
Mitra, SK
[J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 272 - 275
[5] Speaker Diarization in a Multi-Speaker Environment Using Particle Swarm Optimization and Mutual Information
Mirrezaie, S. M.
Ahadi, S. M.
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1533 - 1536
[6] Robust speaker diarization in a multi-speaker environment using autocorrelation-based noise subtraction
Mirrezaie, S. M.
Ahadi, S. M.
Kashi, A.
[J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1-3, 2007, : 962 - 967
[7] INTEGRATION OF SPEECH SEPARATION, DIARIZATION, AND RECOGNITION FOR MULTI-SPEAKER MEETINGS: SYSTEM DESCRIPTION, COMPARISON, AND ANALYSIS
Raj, Desh
Denisov, Pavel
Chen, Zhuo
Erdogan, Hakan
Huang, Zili
He, Maokui
Watanabe, Shinji
Du, Jun
Yoshioka, Takuya
Luo, Yi
Kanda, Naoyuki
Li, Jinyu
Wisdom, Scott
Hershey, John R.
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 897 - 904
[8] Fast ICA for Multi-speaker Recognition System
Zhou, Yan
Zhao, Zhiqiang
[J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, 2010, 93 : 507 - 513
[9] Multi-speaker Recognition in Cocktail Party Problem
Wang, Yiqian
Sun, Wensheng
[J]. COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2019, 463 : 2116 - 2123
[10] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
[J]. DIGITAL SIGNAL PROCESSING, 2024, 145

← 1 2 3 4 5 →