Unsupervised deep feature embeddings for speaker diarization

被引:2
|
作者
Ahmad, Rehan [1 ]
Zubair, Syed [1 ]
机构
[1] Int Islamic Univ, Dept Elect Engn, Fac Engn & Technol, Islamabad, Pakistan
关键词
Diarization error rate; mel-frequency cepstral coefficients; hierarchical clustering; Gaussian mixture model; autoencoder;
D O I
10.3906/elk-1901-125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker diarization aims to determine "who spoke when?" from multispeaker recording environments. In this paper, we propose to learn a set of high-level feature representations, referred to as feature embeddings, from an unsupervised deep architecture for speaker diarization. These sets of embeddings are learned through a deep autoencoder model when trained on mel-frequency cepstral coefficients (MFCCs) of input speech frames. Learned embeddings are then used in Gaussian mixture model based hierarchical clustering for diarization. The results show that these unsupervised embeddings are better compared to MFCCs in reducing the diarization error rate. Experiments conducted on the popular subset of the AMI meeting corpus consisting of 5.4 h of recordings show that the new embeddings decrease the average diarization error rate by 2.96%. However, for individual recordings, maximum improvement of 8.05% is acquired.
引用
收藏
页码:3138 / 3149
页数:12
相关论文
共 50 条
  • [1] Deep Speaker Embeddings Based Online Diarization
    Avdeeva, Anastasia
    Novoselov, Sergey
    [J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 24 - 32
  • [2] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
    Novoselov, Sergey
    Gusev, Aleksei
    Ivanov, Artem
    Pekhovsky, Timur
    Shulipa, Andrey
    Avdeeva, Anastasia
    Gorlanov, Artem
    Kozlov, Alexandr
    [J]. INTERSPEECH 2019, 2019, : 1003 - 1007
  • [3] SPEAKER DIARIZATION USING DEEP NEURAL NETWORK EMBEDDINGS
    Garcia-Romero, Daniel
    Snyder, David
    Sell, Gregory
    Povey, Daniel
    McCree, Alan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4930 - 4934
  • [4] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [5] Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
    Cyrta, Pawel
    Trzcinski, Tomasz
    Stokowiec, Wojciech
    [J]. INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 107 - 117
  • [6] Enhancing Speaker Diarization with Deep Neural Network Embeddings and Spectral Clustering
    Yanshan University, China
    [J].
  • [7] Speaker-Corrupted Embeddings for Online Speaker Diarization
    Ghahabi, Omid
    Fischer, Volker
    [J]. INTERSPEECH 2019, 2019, : 386 - 390
  • [8] SPEAKER DIARIZATION WITH UNSUPERVISED TRAINING FRAMEWORKL
    Le Lan, Gael
    Meignier, Sylvain
    Charlet, Delphine
    Deleglise, Paul
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5560 - 5564
  • [9] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
    Rouvier, Mickael
    Favre, Benoit
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589
  • [10] ECAPA-TDNN Embeddings for Speaker Diarization
    Dawalatabad, Nauman
    Ravanelli, Mirco
    Grondin, Francois
    Thienpondt, Jenthe
    Desplanques, Brecht
    Na, Hwidong
    [J]. INTERSPEECH 2021, 2021, : 3560 - 3564