TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge

被引：2

作者：

Pang, Bowen ^{[1
]}

Zhao, Huan ^{[1
]}

Zhang, Gaosheng ^{[2
]}

Yang, Xiaoyue ^{[2
]}

Sun, Yang ^{[2
]}

Zhang, Li ^{[1
]}

Wang, Qing ^{[1
]}

Xie, Lei ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China

[2] Shenzhen Transs Holding Ltd, Shenzhen, Peoples R China

来源：

2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2022年

关键词：

speaker diarization; spectral clustering; TS-VAD; EEND;

D O I：

10.1109/ISCSLP57327.2022.10037846

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes the TSUP team's submission to the ISCSLP 2022 conversational short-phrase speaker diarization (CSSD) challenge which particularly focuses on short-phrase conversations with a new evaluation metric called conversational diarization error rate (CDER). In this challenge, we explore three kinds of typical speaker diarization systems, which are spectral clustering (SC) based diarization, target-speaker voice activity detection (TS-VAD) and end-to-end neural diarization (EEND) respectively. Our major findings are summarized as follows. First, the SC approach is more favored over the other two approaches under the new CDER metric. Second, tuning on hyperparameters is essential to CDER for all three types of speaker diarization systems. Specifically, CDER becomes smaller when the length of sub-segments setting longer. Finally, multi-system fusion through DOVER-LAP will worsen the CDER metric on the challenge data. Our submitted SC system eventually ranks the third place in the challenge.

引用

页码：502 / 506

页数：5

共 50 条

[31] Speech Enhancement for Multimodal Speaker Diarization System
Ahmad, Rehan
Zubair, Syed
Alquhayz, Hani
[J]. IEEE ACCESS, 2020, 8 : 126671 - 126680
[32] IMPROVED BINARY KEY SPEAKER DIARIZATION SYSTEM
Delgado, Hector
Anguera, Xavier
Fredouille, Corinne
Serrano, Javier
[J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2087 - 2091
[33] FULLY SUPERVISED SPEAKER DIARIZATION
Zhang, Aonan
Wang, Quan
Zhu, Zhenyao
Paisley, John
Wang, Chong
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6301 - 6305
[34] Speaker Diarization with Lexical Information
Park, Tae Jin
Han, Kyu J.
Huang, Jing
He, Xiaodong
Zhou, Bowen
Georgiou, Panayiotis
Narayanan, Shrikanth
[J]. INTERSPEECH 2019, 2019, : 391 - 395
[35] A Cluster Purification Algorithm for Speaker Diarization System
Xiang, Zhang
[J]. 2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 2, 2014,
[36] Multi-Channel Conversational Speaker Separation via Neural Diarization
Taherian, Hassan
Wang, DeLiang
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2467 - 2476
[37] Speaker count: a new building block for speaker diarization
Duong, Thanh Thi-Hien
Nguyen, Phi-Le
Nguyen, Hong-Son
Nguyen, Duc-Chien
Phan, Huy
Duong, Ngoc Q. K.
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1149 - 1155
[38] Bayes Factor Based Speaker Segmentation for Speaker Diarization
Wang, D.
Vogt, R.
Sridharan, S.
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1405 - 1408
[39] Factor Analysis for Speaker Segmentation and Improved Speaker Diarization
Desplanques, Brecht
Demuynck, Kris
Martens, Jean-Pierre
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3081 - 3085
[40] Exploring methods of improving speaker accuracy for speaker diarization
Knox, Mary Tai
Mirghafori, Nikki
Friedland, Gerald
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2782 - 2786

← 1 2 3 4 5 →