ECAPA-TDNN Embeddings for Speaker Diarization

被引:25
|
作者
Dawalatabad, Nauman [1 ,2 ]
Ravanelli, Mirco [2 ]
Grondin, Francois [3 ]
Thienpondt, Jenthe [4 ]
Desplanques, Brecht [4 ]
Na, Hwidong [5 ,6 ]
机构
[1] Indian Inst Technol Madras, Madras, Tamil Nadu, India
[2] Mila Quebec Artificial Intelligence Inst, Montreal, PQ, Canada
[3] Univ Sherbrooke, Sherbrooke, PQ, Canada
[4] Univ Ghent, IMEC, IDLab, Ghent, Belgium
[5] Samsung Adv Inst Technol, Suwon, South Korea
[6] SAIT AI Lab, Montreal, PQ, Canada
来源
关键词
speaker diarization; speaker embedding; data augmentation; spectral clustering;
D O I
10.21437/Interspeech.2021-941
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental component of modern diarization systems. Recently, some improvements over the standard TDNN architecture used for x-vectors have been proposed. The ECAPA-TDNN model, for instance, has shown impressive performance in the speaker verification domain, thanks to a carefully designed neural model. In this work, we extend, for the first time, the use of the ECAPA-TDNN model to speaker diarization. Moreover, we improved its robustness with a powerful augmentation scheme that concatenates several contaminated versions of the same signal within the same training batch. The ECAPA-TDNN model turned out to provide robust speaker embeddings under both close-talking and distant-talking conditions. Our results on the popular AMI meeting corpus show that our system significantly outperforms recently proposed approaches.
引用
收藏
页码:3560 / 3564
页数:5
相关论文
共 50 条
  • [1] Data Augmentation with ECAPA-TDNN Architecture for Automatic Speaker Recognition
    Li, Pinyan
    Hoi, Lap Man
    Wang, Yapeng
    Im, Sio Kei
    [J]. 2023 12TH INTERNATIONAL CONFERENCE ON RENEWABLE ENERGY RESEARCH AND APPLICATIONS, ICRERA, 2023, : 414 - 420
  • [2] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
    Desplanques, Brecht
    Thienpondt, Jenthe
    Demuynck, Kris
    [J]. INTERSPEECH 2020, 2020, : 3830 - 3834
  • [3] DFR-ECAPA: Diffusion Feature Refinement for Speaker Verification Based on ECAPA-TDNN
    Gao, Ya
    Song, Wei
    Zhao, Xiaobing
    Liu, Xiangchun
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 457 - 468
  • [4] ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
    Xue, Jinlong
    Deng, Yayue
    Han, Yichen
    Li, Ya
    Sun, Jianqing
    Liang, Jiaen
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 230 - 234
  • [5] Speaker verification with ECAPA-TDNN trained on new dataset combined with Voxceleb and Korean
    Yoon, Keumjae
    Park, Soyoung
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2024, 37 (02) : 209 - 224
  • [6] Multi-Scene Robust Speaker Verification System Built on Improved ECAPA-TDNN
    Xuan, Xi
    Jin, Rong
    Xuan, Tingyu
    Du, Guolei
    Xuan, Kaisheng
    [J]. 2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1689 - 1693
  • [7] Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions
    Sigona, Francesco
    Grimaldi, Mirko
    [J]. SPEECH COMMUNICATION, 2024, 158
  • [8] ECAPA-TDNN Based Depression Detection from Clinical Speech
    Wang, Dong
    Ding, Yanhui
    Zhao, Qing
    Yang, Peilin
    Tan, Shuping
    Li, Ya
    [J]. INTERSPEECH 2022, 2022, : 3333 - 3337
  • [9] ECAPA-TDNN based online discussion activity-level evaluation
    Kang, Hongbo
    He, Botao
    Song, Ruoyang
    Wang, Wenqing
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [10] An Investigation of ECAPA-TDNN Audio Type Recognition Method Based on Mel Acoustic Spectrograms
    Wang, Jian
    Wang, Zhongzheng
    Han, Xingcheng
    Han, Yan
    [J]. ELECTRONICS, 2023, 12 (21)