TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge

被引：2

作者：

Pang, Bowen ^{[1
]}

Zhao, Huan ^{[1
]}

Zhang, Gaosheng ^{[2
]}

Yang, Xiaoyue ^{[2
]}

Sun, Yang ^{[2
]}

Zhang, Li ^{[1
]}

Wang, Qing ^{[1
]}

Xie, Lei ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China

[2] Shenzhen Transs Holding Ltd, Shenzhen, Peoples R China

来源：

2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2022年

关键词：

speaker diarization; spectral clustering; TS-VAD; EEND;

D O I：

10.1109/ISCSLP57327.2022.10037846

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes the TSUP team's submission to the ISCSLP 2022 conversational short-phrase speaker diarization (CSSD) challenge which particularly focuses on short-phrase conversations with a new evaluation metric called conversational diarization error rate (CDER). In this challenge, we explore three kinds of typical speaker diarization systems, which are spectral clustering (SC) based diarization, target-speaker voice activity detection (TS-VAD) and end-to-end neural diarization (EEND) respectively. Our major findings are summarized as follows. First, the SC approach is more favored over the other two approaches under the new CDER metric. Second, tuning on hyperparameters is essential to CDER for all three types of speaker diarization systems. Specifically, CDER becomes smaller when the length of sub-segments setting longer. Finally, multi-system fusion through DOVER-LAP will worsen the CDER metric on the challenge data. Our submitted SC system eventually ranks the third place in the challenge.

引用

页码：502 / 506

页数：5

共 50 条

[1] The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022
Liu, Tao
Xiang, Xu
Chen, Zhengyang
Han, Bing
Yu, Kai
Qian, Yanmin
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 498 - 501
[2] The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
Cheng, Gaofeng
Chen, Yifan
Yang, Runyan
Li, Qingxuan
Yang, Zehui
Ye, Lingxuan
Zhang, Pengyuan
Zhang, Qingqing
Xie, Lei
Qian, Yanmin
Lee, Kong Aik
Yan, Yonghong
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 488 - 492
[3] Conversational Short-Phrase Speaker Diarization via Self-Adjusting Speech Segmentation and Embedding Extraction
Lu, Haitian
Cheng, Gaofeng
Yan, Yonghong
[J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2340 - 2344
[4] MICROSOFT SPEAKER DIARIZATION SYSTEM FOR THE VOXCELEB SPEAKER RECOGNITION CHALLENGE 2020
Xiao, Xiong
Kanda, Naoyuki
Chen, Zhuo
Zhou, Tianyan
Yoshioka, Takuya
Chen, Sanyuan
Zhao, Yong
Liu, Gang
Wu, Yu
Wu, Jian
Liu, Shujie
Li, Jinyu
Gong, Yifan
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5824 - 5828
[5] An Improved Speaker Diarization System
Fu, Rong
Benest, Ian D.
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1253 - 1256
[6] ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge
Vinals, Ignacio
Gimeno, Pablo
Ortega, Alfonso
Miguel, Antonio
Lleida, Eduardo
[J]. INTERSPEECH 2019, 2019, : 988 - 992
[7] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
Novoselov, Sergey
Gusev, Aleksei
Ivanov, Artem
Pekhovsky, Timur
Shulipa, Andrey
Avdeeva, Anastasia
Gorlanov, Artem
Kozlov, Alexandr
[J]. INTERSPEECH 2019, 2019, : 1003 - 1007
[8] Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments
Baghel, Shikha
Ramoji, Shreyas
Jain, Somil
Chowdhuri, Pratik Roy
Singh, Prachi
Vijayasenan, Deepu
Ganapathy, Sriram
[J]. SPEECH COMMUNICATION, 2024, 161
[9] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
Rouvier, Mickael
Bousquet, Pierre-Michel
Favre, Benoit
[J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
[10] SPEAKER DIARIZATION WITH LSTM
Wang, Quan
Downey, Carlton
Wan, Li
Mansfield, Philip Andrew
Moreno, Ignacio Lopez
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243

← 1 2 3 4 5 →