Automatic speaker clustering from multi-speaker utterances

被引：1

作者：

McLaughlin, J ^{[1
]}

Reynolds, D ^{[1
]}

Singer, E ^{[1
]}

O'Leary, GC ^{[1
]}

机构：

[1] MIT, Lincoln Lab, Lexington, MA 02420 USA

来源：

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI | 1999年

关键词：

D O I：

10.1109/ICASSP.1999.759796

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Blind clustering of multi-person utterances by speaker is complicated by the fact that each utterance has at least two talkers. In the case of a two-person conversation, one can simply split each conversation into its respective speaker halves, but this introduces error which ultimately hurts clustering. We propose a clustering algorithm which is capable of associating each conversation with two clusters (and therefore two-speakers) obviating the need for splitting. Results are given for two speaker conversations culled from the Switchboard corpus, and comparisons are made to results obtained on single-speaker utterances. We conclude that although the approach is promising, our technique for computing inter-conversation similarities prior to clustering needs improvement.

引用

页码：817 / 820

页数：4

共 50 条

[31] Multi-speaker voice cryptographic key generation
Paola Garcia-Perera, L.
Carlos Mex-Perera, J.
Nolazco-Flores, Juan A.
[J]. 3RD ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, 2005, 2005,
[32] MultiSpeech: Multi-Speaker Text to Speech with Transformer
Chen, Mingjian
Tan, Xu
Ren, Yi
Xu, Jin
Sun, Hao
Zhao, Sheng
Qin, Tao
[J]. INTERSPEECH 2020, 2020, : 4024 - 4028
[33] Evolutive HMM for multi-speaker tracking system
Meignier, S
Bonastre, JF
Fredouille, C
Merlin, T
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1201 - 1204
[34] Multi-speaker Recognition in Cocktail Party Problem
Wang, Yiqian
Sun, Wensheng
[J]. COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2019, 463 : 2116 - 2123
[35] Multi-speaker Beamforming for Voice Activity Classification
Tran, Thuy N.
Cowley, William
Pollok, Andre
[J]. 2013 AUSTRALIAN COMMUNICATIONS THEORY WORKSHOP (AUSCTW), 2013, : 116 - 121
[36] AN INVESTIGATION OF MULTI-SPEAKER TRAINING FORWAVENET VOCODER
Hayashi, Tomoki
Tamamori, Akira
Kobayashi, Kazuhiro
Takeda, Kazuya
Toda, Tomoki
[J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 712 - 718
[37] Multi-speaker experimental designs: Methodological considerations
Offrede, Tom
Fuchs, Susanne
Mooshammer, Christine
[J]. LANGUAGE AND LINGUISTICS COMPASS, 2021, 15 (12):
[38] ForumSum: A Multi-Speaker Conversation Summarization Dataset
Khalman, Misha
Zhao, Yao
Saleh, Mohammad
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4592 - 4599
[39] SPEAKER CONDITIONING OF ACOUSTIC MODELS USING AFFINE TRANSFORMATION FOR MULTI-SPEAKER SPEECH RECOGNITION
Yousefi, Midia
Hansen, John H. L.
[J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 283 - 288
[40] Speaker Diarization in a Multi-Speaker Environment Using Particle Swarm Optimization and Mutual Information
Mirrezaie, S. M.
Ahadi, S. M.
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1533 - 1536

← 1 2 3 4 5 →