Unsupervised deep feature embeddings for speaker diarization

被引:2
|
作者
Ahmad, Rehan [1 ]
Zubair, Syed [1 ]
机构
[1] Int Islamic Univ, Dept Elect Engn, Fac Engn & Technol, Islamabad, Pakistan
关键词
Diarization error rate; mel-frequency cepstral coefficients; hierarchical clustering; Gaussian mixture model; autoencoder;
D O I
10.3906/elk-1901-125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker diarization aims to determine "who spoke when?" from multispeaker recording environments. In this paper, we propose to learn a set of high-level feature representations, referred to as feature embeddings, from an unsupervised deep architecture for speaker diarization. These sets of embeddings are learned through a deep autoencoder model when trained on mel-frequency cepstral coefficients (MFCCs) of input speech frames. Learned embeddings are then used in Gaussian mixture model based hierarchical clustering for diarization. The results show that these unsupervised embeddings are better compared to MFCCs in reducing the diarization error rate. Experiments conducted on the popular subset of the AMI meeting corpus consisting of 5.4 h of recordings show that the new embeddings decrease the average diarization error rate by 2.96%. However, for individual recordings, maximum improvement of 8.05% is acquired.
引用
收藏
页码:3138 / 3149
页数:12
相关论文
共 50 条
  • [41] Deep Self-Supervised Hierarchical Clustering for Speaker Diarization
    Singh, Prachi
    Ganapathy, Sriram
    [J]. INTERSPEECH 2020, 2020, : 294 - 298
  • [42] Optimized speaker change detection approach for speaker segmentation towards speaker diarization based on deep learning
    VijayKumar, K.
    Rao, R. Rajeswara
    [J]. DATA & KNOWLEDGE ENGINEERING, 2023, 144
  • [43] INVESTIGATING DEEP NEURAL NETWORKS FOR SPEAKER DIARIZATION IN THE DIHARD CHALLENGE
    Himawan, Ivan
    Rahman, Md Hafizur
    Sridharan, Sridha
    Fookes, Clinton
    Kanagasundaram, Ahilan
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1029 - 1035
  • [44] Using Direction of Arrival Estimate and Acoustic Feature Information in Speaker Diarization
    Koh, Eugene Chin Wei
    Sun, Hanwu
    Nwe, Tin Lay
    Nguyen, Trung Hieu
    Ma, Bin
    Chng, Eng-Siong
    Li, Haizhou
    Rahardja, Susanto
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2181 - +
  • [45] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
  • [46] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [47] Trainable Speaker Diarization
    Aronowitz, Hagai
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
  • [48] SPEAKER DIARIZATION USING UNSUPERVISED DISCRIMINANT ANALYSIS OF INTER-CHANNEL DELAY FEATURES
    Evans, Nicholas W. D.
    Fredouille, Corinne
    Bonastre, Jean-Francois
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4061 - +
  • [49] New experiments on speaker diarization for unsupervised speaking style voice building for speech synthesis
    Martinez-Gonzalez, Beatriz
    Manuel Pardo, Jose
    Echeverry-Correa, J. D.
    Montero, J. M.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2014, (52): : 77 - 84
  • [50] Speaker Diarization using Leave-one-out Gaussian PLDA Clustering of DNN Embeddings
    McCree, Alan
    Sell, Gregory
    Garcia-Romero, Daniel
    [J]. INTERSPEECH 2019, 2019, : 381 - 385