Robust Speaker Diarization for Short Speech Recordings

被引:11
|
作者
Imseng, David [1 ,2 ]
Friedland, Gerald [3 ]
机构
[1] Idiap Res Inst, POB 592, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
[3] Int Comp Sci Inst, Berkeley, CA 94704 USA
关键词
D O I
10.1109/ASRU.2009.5373254
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate a state-of-the-art Speaker Diarization system regarding its behavior on meetings that are much shorter (from 500 seconds down to 100 seconds) than those typically analyzed in Speaker Diarization benchmarks. First, the problems inherent to this task are analyzed. Then, we propose an approach that consists of a novel initialization parameter estimation method for typical state-of-the-art diarization approaches. The estimation method balances the relationship between the optimal value of the duration of speech data per Gaussian and the duration of the speech data, which is verified experimentally for the first time in this article. As a result, the Diarization Error Rate for short meetings extracted from the 2006, 2007, and 2009 NIST RT evaluation data is decreased by up to 50 % relative.
引用
收藏
页码:432 / +
页数:2
相关论文
共 50 条
  • [41] Speaker diarization method of telemarketer and client for improving speech dictation performance
    Dahae Jung
    Min-Kyoung Bae
    Man Yong Choi
    Eui Chul Lee
    Jinoo Joung
    The Journal of Supercomputing, 2016, 72 : 1757 - 1769
  • [42] Speech and multilingual natural language framework for speaker change detection and diarization
    Anidjar, Or Haim
    Esteve, Yannick
    Hajaj, Chen
    Dvir, Amit
    Lapidot, Itshak
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [43] Speaker diarization method of telemarketer and client for improving speech dictation performance
    Jung, Dahae
    Bae, Min-Kyoung
    Choi, Man Yong
    Lee, Eui Chul
    Joung, Jinoo
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (05): : 1757 - 1769
  • [44] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [45] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
  • [46] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [47] A CLUSTER-VOTING APPROACH FOR SPEAKER DIARIZATION AND LINKING OF AUSTRALIAN BROADCAST NEWS RECORDINGS
    Ghaemmaghami, Houman
    Dean, David
    Sridharan, Sridha
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4829 - 4833
  • [48] Trainable Speaker Diarization
    Aronowitz, Hagai
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
  • [49] Automatic cluster complexity and quantity selection: Towards robust speaker diarization
    Anguera, Xavier
    Wooters, Chuck
    Hernando, Javier
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 248 - +
  • [50] TRANSFER LEARNING USING RAW WAVEFORM SINCNET FOR ROBUST SPEAKER DIARIZATION
    Dubey, Harishchandra
    Sangwan, Abhijeet
    Hansen, John H. L.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6296 - 6300