Analysis of transition cost and model parameters in speaker diarization for meetings

被引:0
|
作者
Beatriz Martínez-González
José M. Pardo
José A. Vallejo-Pinto
Rubén San-Segundo
Javier Ferreiros
机构
[1] Universidad Tecnológica de Pereira,Department of Computer Science
[2] Universidad Politécnica de Madrid,undefined
[3] University of Oviedo,undefined
关键词
Speaker diarization; Speaker segmentation; Model complexity selection; Speaker modeling;
D O I
暂无
中图分类号
学科分类号
摘要
There has been little work in the literature on the speaker diarization of meetings with multiple distance microphones since the publications in 2012 related to the last National Institute of Standards (NIST) Rich Transcription Evaluation Campaign in 2009 (RT09). Lately, the Second DIHARD Challenge Evaluation has also covered diarization at dinner party meetings that include multiple distant microphones. Dinner party meetings are somehow harder than office meetings because their participants can move freely around the room. In this paper, we studied some of the algorithms on speaker diarization for meetings with multiple distant microphones for the NIST Rich Transcription Evaluation Campaign in 2007 (RT07) and RT09 and provide definite and clear improvements. On the one hand, little or no care has been taken to the problem of penalizing or favoring transitions between speakers other than proposing a minimum duration of a speaker turn or calculating the speakers’ probabilities using Variational Bayes (VB). We have studied this issue and determined that a transition penalty term is needed that should be independent both of the number of active speakers and the minimum duration of speaker turns. On the other hand, the determination of a method to automatically select the right number of parameters is crucial in developing good models for speakers. Previous studies have proposed the dynamic selection of the number of parameters based on the duration of the speaker’s speech with a mixed performance when tested at one distant microphone meetings or multiple distant microphones meetings. In this paper, we propose a new method that takes into account both the duration of speaker’s speech to determine a minimum number of parameters, and the question of overfitting issue to determine a maximum number of them, also taking into account the computation time in order to reduce it.
引用
收藏
相关论文
共 50 条
  • [1] Analysis of transition cost and model parameters in speaker diarization for meetings
    Martinez-Gonzalez, Beatriz
    Pardo, Jose M.
    Vallejo-Pinto, Jose A.
    San-Segundo, Ruben
    Ferreiros, Javier
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [2] The SAIL Speaker Diarization System for Analysis of Spontaneous Meetings
    Han, Kyu J.
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth S.
    [J]. 2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 970 - 975
  • [3] IMPROVED SPEAKER DIARIZATION SYSTEM FOR MEETINGS
    El-Khoury, Elie
    Senac, Christine
    Pinquier, Julien
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4097 - 4100
  • [4] Acoustic beamforming for speaker diarization of meetings
    Anguera, Xavier
    Wooters, Chuck
    Hernando, Javier
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2011 - 2022
  • [5] Purity algorithms for speaker diarization of meetings data
    Anguera, Xavier
    Wooters, Chuck
    Hernando, Javier
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1025 - 1028
  • [6] Improving Speaker Diarization for CHIL Lecture Meetings
    Huang, Jing
    Marcheret, Etienne
    Visweswariah, Karthik
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2628 - 2631
  • [7] A DOA based speaker diarization system for real meetings
    Araki, Shoko
    Fujimoto, Masakiyo
    Ishizuka, Kentaro
    Sawada, Hiroshi
    Makino, Shoji
    [J]. 2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 30 - 33
  • [8] Agglomerative Information Bottleneck for speaker diarization of meetings data
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 250 - 255
  • [9] SPEAKER DIARIZATION OF MEETINGS BASED ON SPEAKER ROLE N-GRAM MODELS
    Valente, Fabio
    Vijayasenan, Deepu
    Motlicek, Petr
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4416 - 4419
  • [10] KL-HMM BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS
    Madikeri, Srikanth
    Bourlard, Herve
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4435 - 4439