MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION

被引：0

作者：

Sell, Gregory ^{[1
]}

McCree, Alan ^{[1
]}

机构：

[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

speaker diarization; speaker recognition; i-vectors;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

I-vector training and extraction assume that a speech file is spoken by a single speaker. This work considers the effects of violating that assumption with the presence of cross-talk or multi-speaker conversations. First, it is demonstrated that these problematic speech files can be detected using the i-vector representation itself. The impact of these violations of the single-speaker assumption are then explored along with strategies to mitigate it. It is shown that, even in predominantly clean data, the removal of cross-talk can provide modest gains, but that T matrix and PLDA training are largely robust to these types of noise. It is also shown that detection in front of diarization is a reasonable strategy in the presence of data with an unknown proportion of multi-speaker conversations. Finally, in the course of this work, evidence is found that cross-talk detection and multi-speaker detection may in fact be different tasks that require separately trained detectors.

引用

页码：5425 / 5429

页数：5

共 50 条

[41] INVESTIGATION OF FAST AND EFFICIENT METHODS FOR MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION
Zheng, Yibin
Li, Xinhui
Lu, Li
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6618 - 6622
[42] Mahalanobis Based Emission Model for Speaker Diarization of Telephone Conversations
Furmanov, Tal
Aminov, Lidiya
Moyal, Ami
Lapidot, Itshak
[J]. 2014 IEEE 28TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL (IEEEI), 2014,
[43] Multiple feature combination to improve speaker diarization of telephone conversations
Gupta, Vishwa
Kenny, Patrick
Ouellet, Pierre
Boulianne, Gilles
Dumouchel, Pierre
[J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 705 - 710
[44] Multi-Speaker Adaptation for Robust Speech Recognition under Ubiquitous Environment
Shih, Po-Yi
Wang, Jhing-Fa
Lin, Yuan-Ning
Fu, Zhong-Hua
[J]. ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 126 - 131
[45] A Purely End-to-end System for Multi-speaker Speech Recognition
Seki, Hiroshi
Hori, Takaaki
Watanabe, Shinji
Le Roux, Jonathan
Hershey, John R.
[J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
[46] Multi-Speaker Meeting Audio Segmentation
Nwe, Tin Lay
Dong, Minghui
Khine, Swe Zin Kalayar
Li, Haizhou
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2522 - 2525
[47] Investigation of Cross-show Speaker Diarization
Yang, Qian
Jin, Qin
Schultz, Tanja
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2936 - +
[48] Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments
P. Krishnamoorthy
S. R. Mahadeva Prasanna
[J]. Sadhana, 2009, 34 : 729 - 754
[49] Improving speaker diarization by cross EM refinement
Ning, Huazhong
Xu, Wei
Gong, Yihong
Huang, Thomas
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1901 - 1904
[50] Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments
Krishnamoorthy, P.
Prasanna, S. R. Mahadeva
[J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2009, 34 (05): : 729 - 754

← 1 2 3 4 5 →