Analysis of transition cost and model parameters in speaker diarization for meetings

被引：0

作者：

Beatriz Martínez-González

José M. Pardo

José A. Vallejo-Pinto

Rubén San-Segundo

Javier Ferreiros

机构：

[1] Universidad Tecnológica de Pereira,Department of Computer Science

[2] Universidad Politécnica de Madrid,undefined

[3] University of Oviedo,undefined

来源：

EURASIP Journal on Audio, Speech, and Music Processing | / 2021卷

关键词：

Speaker diarization; Speaker segmentation; Model complexity selection; Speaker modeling;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

There has been little work in the literature on the speaker diarization of meetings with multiple distance microphones since the publications in 2012 related to the last National Institute of Standards (NIST) Rich Transcription Evaluation Campaign in 2009 (RT09). Lately, the Second DIHARD Challenge Evaluation has also covered diarization at dinner party meetings that include multiple distant microphones. Dinner party meetings are somehow harder than office meetings because their participants can move freely around the room. In this paper, we studied some of the algorithms on speaker diarization for meetings with multiple distant microphones for the NIST Rich Transcription Evaluation Campaign in 2007 (RT07) and RT09 and provide definite and clear improvements. On the one hand, little or no care has been taken to the problem of penalizing or favoring transitions between speakers other than proposing a minimum duration of a speaker turn or calculating the speakers’ probabilities using Variational Bayes (VB). We have studied this issue and determined that a transition penalty term is needed that should be independent both of the number of active speakers and the minimum duration of speaker turns. On the other hand, the determination of a method to automatically select the right number of parameters is crucial in developing good models for speakers. Previous studies have proposed the dynamic selection of the number of parameters based on the duration of the speaker’s speech with a mixed performance when tested at one distant microphone meetings or multiple distant microphones meetings. In this paper, we propose a new method that takes into account both the duration of speaker’s speech to determine a minimum number of parameters, and the question of overfitting issue to determine a maximum number of them, also taking into account the computation time in order to reduce it.

引用

共 50 条

[1] Analysis of transition cost and model parameters in speaker diarization for meetings
Martinez-Gonzalez, Beatriz
Pardo, Jose M.
Vallejo-Pinto, Jose A.
San-Segundo, Ruben
Ferreiros, Javier
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[2] The SAIL Speaker Diarization System for Analysis of Spontaneous Meetings
Han, Kyu J.
Georgiou, Panayiotis G.
Narayanan, Shrikanth S.
[J]. 2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 970 - 975
[3] IMPROVED SPEAKER DIARIZATION SYSTEM FOR MEETINGS
El-Khoury, Elie
Senac, Christine
Pinquier, Julien
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4097 - 4100
[4] Acoustic beamforming for speaker diarization of meetings
Anguera, Xavier
Wooters, Chuck
Hernando, Javier
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2011 - 2022
[5] Purity algorithms for speaker diarization of meetings data
Anguera, Xavier
Wooters, Chuck
Hernando, Javier
[J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1025 - 1028
[6] Improving Speaker Diarization for CHIL Lecture Meetings
Huang, Jing
Marcheret, Etienne
Visweswariah, Karthik
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2628 - 2631
[7] A DOA based speaker diarization system for real meetings
Araki, Shoko
Fujimoto, Masakiyo
Ishizuka, Kentaro
Sawada, Hiroshi
Makino, Shoji
[J]. 2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 30 - 33
[8] Agglomerative Information Bottleneck for speaker diarization of meetings data
Vijayasenan, Deepu
Valente, Fabio
Bourlard, Herve
[J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 250 - 255
[9] SPEAKER DIARIZATION OF MEETINGS BASED ON SPEAKER ROLE N-GRAM MODELS
Valente, Fabio
Vijayasenan, Deepu
Motlicek, Petr
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4416 - 4419
[10] KL-HMM BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS
Madikeri, Srikanth
Bourlard, Herve
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4435 - 4439

← 1 2 3 4 5 →