MICROSOFT SPEAKER DIARIZATION SYSTEM FOR THE VOXCELEB SPEAKER RECOGNITION CHALLENGE 2020

被引:26
|
作者
Xiao, Xiong [1 ]
Kanda, Naoyuki [1 ]
Chen, Zhuo [1 ]
Zhou, Tianyan [1 ]
Yoshioka, Takuya [1 ]
Chen, Sanyuan [1 ]
Zhao, Yong [1 ]
Liu, Gang [1 ]
Wu, Yu [1 ]
Wu, Jian [1 ]
Liu, Shujie [1 ]
Li, Jinyu [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
speaker diarization; speaker recognition; speech separation; system fusion;
D O I
10.1109/ICASSP39728.2021.9413832
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes the Microsoft speaker diarization system for monaural multi-talker recordings in the wild, evaluated at the diarization track of the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. We will first explain our system design to address issues in handling real multi-talker recordings. We then present the details of the components, which include Res2Net-based speaker embedding extractor, conformer-based continuous speech separation with leakage filtering, and a modified DOVER (short for Diarization Output Voting Error Reduction) method for system fusion. We evaluate the systems with the data set provided by VoxSRC challenge 2020, which contains real-life multi-talker audio collected from YouTube. Our best system achieves 3.71% and 6.23% of the diarization error rate (DER) on development set and evaluation set, respectively, being ranked the 1st at the diarization track of the challenge.
引用
收藏
页码:5824 / 5828
页数:5
相关论文
共 50 条
  • [1] The BUCEA Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2022
    Zhou, Ruohua
    Du, Yuxuan
    Hu, Chenlei
    [J]. arXiv, 2022,
  • [2] The VoxCeleb Speaker Recognition Challenge: A Retrospective
    Huh, Jaesung
    Chung, Joon Son
    Nagrani, Arsha
    Brown, Andrew
    Jung, Jee-weon
    Garcia-Romero, Daniel
    Zisserman, Andrew
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3850 - 3866
  • [3] TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
    Pang, Bowen
    Zhao, Huan
    Zhang, Gaosheng
    Yang, Xiaoyue
    Sun, Yang
    Zhang, Li
    Wang, Qing
    Xie, Lei
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 502 - 506
  • [4] Investigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition Challenge
    Liu, Yi
    Tian, Yao
    He, Liang
    Liu, Jia
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 853 - 857
  • [5] The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022
    Liu, Tao
    Xiang, Xu
    Chen, Zhengyang
    Han, Bing
    Yu, Kai
    Qian, Yanmin
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 498 - 501
  • [6] ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge
    Vinals, Ignacio
    Gimeno, Pablo
    Ortega, Alfonso
    Miguel, Antonio
    Lleida, Eduardo
    [J]. INTERSPEECH 2019, 2019, : 988 - 992
  • [7] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
    Novoselov, Sergey
    Gusev, Aleksei
    Ivanov, Artem
    Pekhovsky, Timur
    Shulipa, Andrey
    Avdeeva, Anastasia
    Gorlanov, Artem
    Kozlov, Alexandr
    [J]. INTERSPEECH 2019, 2019, : 1003 - 1007
  • [8] VoxCeleb2: Deep Speaker Recognition
    Chung, Joon Son
    Nagrani, Arsha
    Zisserman, Andrew
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1086 - 1090
  • [9] An Improved Speaker Diarization System
    Fu, Rong
    Benest, Ian D.
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1253 - 1256
  • [10] A Speaker Recognition System for the SITW Challenge
    Kudashev, Oleg
    Novoselov, Sergey
    Simonchik, Konstantin
    Kozlov, Alexandr
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 833 - 837