Factor Analysis for Speaker Segmentation and Improved Speaker Diarization

被引:0
|
作者
Desplanques, Brecht [1 ]
Demuynck, Kris [1 ]
Martens, Jean-Pierre [1 ]
机构
[1] Ghent Univ iMinds, ELIS Multimedia Lab, Ghent, Belgium
关键词
speaker change detection; speaker diarization; clustering; segmentation; factor analysis;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarization includes two steps: speaker segmentation and speaker clustering. Speaker segmentation searches for speaker boundaries, whereas speaker clustering aims at grouping speech segments of the same speaker. In this work, the segmentation is improved by replacing the Bayesian Information Criterion (BIC) with a new iVector-based approach. Unlike BIC-based methods which trigger on any acoustic dissimilarities, the proposed method suppresses phonetic variations and accentuates speaker differences. More specifically our method generates boundaries based on the distance between two speaker factor vectors that are extracted on a frame-by frame basis. The extraction relies on an eigenvoice matrix so that large differences between speaker factor vectors indicate a different speaker. A Mahalanobis-based distance measure, in which the covariance matrix compensates for the remaining and detrimental phonetic variability, is shown to generate accurate boundaries. The detected segments are clustered by a state-of-the-art iVector Probabilistic Linear Discriminant Analysis system. Experiments on the COST278 multilingual broadcast news database show relative reductions of 50% in boundary detection errors. The speaker error rate is reduced by 8% relative.
引用
收藏
页码:3081 / 3085
页数:5
相关论文
共 50 条
  • [31] WHERE ARE THE CHALLENGES IN SPEAKER DIARIZATION?
    Sinclair, Mark
    King, Simon
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7741 - 7745
  • [32] Improved Novelty Detection for Online GMM based Speaker Diarization
    Markov, Konstantin
    Nakamura, Satoshi
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 363 - 366
  • [33] SPEAKER DIARIZATION IN MEETING AUDIO
    Nwe, Tin Lay
    Sun, Hanwu
    Li, Haizhou
    Rahardja, Susanto
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4073 - 4076
  • [34] Overlapped speech detection for improved speaker diarization in multiparty meetings
    Boakye, Kofi
    Trueba-Hornero, Beatriz
    Vinyals, Oriol
    Friedland, Gerald
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4353 - 4356
  • [35] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
    Zajic, Zbynek
    Zelinka, Jan
    Mueller, Ludek
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563
  • [36] Speaker Diarization with Lexical Information
    Park, Tae Jin
    Han, Kyu J.
    Huang, Jing
    He, Xiaodong
    Zhou, Bowen
    Georgiou, Panayiotis
    Narayanan, Shrikanth
    [J]. INTERSPEECH 2019, 2019, : 391 - 395
  • [37] FULLY SUPERVISED SPEAKER DIARIZATION
    Zhang, Aonan
    Wang, Quan
    Zhu, Zhenyao
    Paisley, John
    Wang, Chong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6301 - 6305
  • [38] Speaker Diarization and Detection System using A Priori Speaker Information
    Kenai, Ouassila
    Asbai, Nassim
    Ouamour, Siham
    Guerti, Mhania
    Djeghiour, Salim
    [J]. 2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP), 2018, : 73 - 78
  • [39] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
    Novoselov, Sergey
    Gusev, Aleksei
    Ivanov, Artem
    Pekhovsky, Timur
    Shulipa, Andrey
    Avdeeva, Anastasia
    Gorlanov, Artem
    Kozlov, Alexandr
    [J]. INTERSPEECH 2019, 2019, : 1003 - 1007
  • [40] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
    Rouvier, Mickael
    Favre, Benoit
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589