Using confidence measures to evaluate the speaker turns in speaker segmentation

被引:0
|
作者
Chu, Wei [1 ]
Liu, Jia [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we propose a speaker segmentation algorithm using confidence measures, named CM-DISTBIC, which inserts a confidence score computation and fusion procedure into the two-step DISTBIC and MDISTBIC. In the first step, symmetric Kullback-Leibler distance (KL2) distance is replaced by Bayesian Information Criterion (BIC) distance to obtain a lower misdetection rate. In the second step, three different confidence measures are attached to the speaker change candidates according to the distance curve derived from the first step. False alarm peaks with relatively low fused confidence scores are eliminate from the set of potential speak turns. In. the third step, speaker turn candidates are validated through BIC criterion. Compared with DISTBIC and MDISTBIC, the CM-DISTBIC conducted on the broadcast news corpora receives an increase of more than 11.5% and 8.9% in F-score respectively.
引用
下载
收藏
页码:728 / 731
页数:4
相关论文
共 50 条
  • [21] Speaker adaptive confidence scoring using Bayesian combining
    Kim, TY
    Ko, H
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 77 - 80
  • [22] Automatic speaker recognition using statistical measures
    Sayoud, H
    Ouamour-Sayoud, S
    INTELLIGENT AND ADAPTIVE SYSTEMS AND SOFTWARE ENGINEERING, 2004, : 100 - 103
  • [23] USING CLUSTERING COMPARISON MEASURES FOR SPEAKER RECOGNITION
    Kua, Jia Min Karen
    Epps, Julien
    Nosratighods, Mohaddeseh
    Ambikairajah, Eliathamby
    Choi, Eric
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5452 - 5455
  • [24] Using quality measures for multilevel speaker recognition
    Garcia-Romero, D
    Fierrez-Aguilar, J
    Gonzalez-Rodriguez, J
    Ortega-Garcia, J
    COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3): : 192 - 209
  • [25] Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold using Deep Neural Networks with an Evaluation on Speaker Segmentation
    Jati, Arindam
    Georgiou, Panayiotis
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3567 - 3571
  • [26] Location based speaker segmentation
    Lathoud, G
    McCowan, IA
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 176 - 179
  • [27] Voting for two speaker segmentation
    Narayanaswamy, Balakrishnan
    Gangadharaiah, Rashmi
    Stern, Richard
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2086 - +
  • [28] Accuracy and confidence in estimation of speaker age
    Waller, Sara Maria Birgitta Skoog
    INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2020, 27 (02) : 163 - 179
  • [29] Optimized speaker change detection approach for speaker segmentation towards speaker diarization based on deep learning
    VijayKumar, K.
    Rao, R. Rajeswara
    DATA & KNOWLEDGE ENGINEERING, 2023, 144
  • [30] Location based speaker segmentation
    Lathoud, G
    McCowan, IA
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 621 - 624