TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge

被引:2
|
作者
Pang, Bowen [1 ]
Zhao, Huan [1 ]
Zhang, Gaosheng [2 ]
Yang, Xiaoyue [2 ]
Sun, Yang [2 ]
Zhang, Li [1 ]
Wang, Qing [1 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
[2] Shenzhen Transs Holding Ltd, Shenzhen, Peoples R China
关键词
speaker diarization; spectral clustering; TS-VAD; EEND;
D O I
10.1109/ISCSLP57327.2022.10037846
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the TSUP team's submission to the ISCSLP 2022 conversational short-phrase speaker diarization (CSSD) challenge which particularly focuses on short-phrase conversations with a new evaluation metric called conversational diarization error rate (CDER). In this challenge, we explore three kinds of typical speaker diarization systems, which are spectral clustering (SC) based diarization, target-speaker voice activity detection (TS-VAD) and end-to-end neural diarization (EEND) respectively. Our major findings are summarized as follows. First, the SC approach is more favored over the other two approaches under the new CDER metric. Second, tuning on hyperparameters is essential to CDER for all three types of speaker diarization systems. Specifically, CDER becomes smaller when the length of sub-segments setting longer. Finally, multi-system fusion through DOVER-LAP will worsen the CDER metric on the challenge data. Our submitted SC system eventually ranks the third place in the challenge.
引用
收藏
页码:502 / 506
页数:5
相关论文
共 50 条
  • [1] The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022
    Liu, Tao
    Xiang, Xu
    Chen, Zhengyang
    Han, Bing
    Yu, Kai
    Qian, Yanmin
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 498 - 501
  • [2] The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
    Cheng, Gaofeng
    Chen, Yifan
    Yang, Runyan
    Li, Qingxuan
    Yang, Zehui
    Ye, Lingxuan
    Zhang, Pengyuan
    Zhang, Qingqing
    Xie, Lei
    Qian, Yanmin
    Lee, Kong Aik
    Yan, Yonghong
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 488 - 492
  • [3] Conversational Short-Phrase Speaker Diarization via Self-Adjusting Speech Segmentation and Embedding Extraction
    Lu, Haitian
    Cheng, Gaofeng
    Yan, Yonghong
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2340 - 2344
  • [4] MICROSOFT SPEAKER DIARIZATION SYSTEM FOR THE VOXCELEB SPEAKER RECOGNITION CHALLENGE 2020
    Xiao, Xiong
    Kanda, Naoyuki
    Chen, Zhuo
    Zhou, Tianyan
    Yoshioka, Takuya
    Chen, Sanyuan
    Zhao, Yong
    Liu, Gang
    Wu, Yu
    Wu, Jian
    Liu, Shujie
    Li, Jinyu
    Gong, Yifan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5824 - 5828
  • [5] An Improved Speaker Diarization System
    Fu, Rong
    Benest, Ian D.
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1253 - 1256
  • [6] ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge
    Vinals, Ignacio
    Gimeno, Pablo
    Ortega, Alfonso
    Miguel, Antonio
    Lleida, Eduardo
    [J]. INTERSPEECH 2019, 2019, : 988 - 992
  • [7] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
    Novoselov, Sergey
    Gusev, Aleksei
    Ivanov, Artem
    Pekhovsky, Timur
    Shulipa, Andrey
    Avdeeva, Anastasia
    Gorlanov, Artem
    Kozlov, Alexandr
    [J]. INTERSPEECH 2019, 2019, : 1003 - 1007
  • [8] Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments
    Baghel, Shikha
    Ramoji, Shreyas
    Jain, Somil
    Chowdhuri, Pratik Roy
    Singh, Prachi
    Vijayasenan, Deepu
    Ganapathy, Sriram
    [J]. SPEECH COMMUNICATION, 2024, 161
  • [9] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [10] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243