DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS

被引：0

作者：

Milner, Rosanna ^{[1
]}

Hain, Thomas ^{[1
]}

机构：

[1] Univ Sheffield, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

基金：

英国工程与自然科学研究理事会;

关键词：

speaker diarisation; multi-channel; crosstalk; deep neural networks; speaker channels; DIARIZATION; SPEECH;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker diarisation addresses the question of "who speaks when" in audio recordings, and has been studied extensively in the context of tasks such as broadcast news, meetings, etc. Performing diarisation on individual headset microphone (IHM) channels is sometimes assumed to easily give the desired output of speaker labelled segments with timing information. However, it is shown that given imperfect data, such as speaker channels with heavy cross talk and overlapping speech, this is not the case. Deep neural networks (DNNs) can be trained on features derived from the concatenation of speaker channel features to detect which is the correct channel for each frame. Crosstalk features can be calculated and DNNs trained with or without overlapping speech to combat problematic data. A simple frame decision metric of counting occurrences is investigated as weIl as adding a bias against selecting nonspeech for a frame. Finally, two different scoring setups are applied to both datasets. The stricter SHEF setup finds diarisation error rates (DER) of 9.2% on TBL and 23.2% on RT07 while the NIST setup achieves 5.7% and 15.1% respectively.

引用

页码：4925 / 4929

页数：5

共 50 条

[31] Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription
Zheng, Xianrui
Zhang, Chao
Woodland, Phil C.
[J]. INTERSPEECH 2022, 2022, : 3844 - 3848
[32] Two-way cluster voting to improve speaker diarisation performance
Tranter, SE
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 753 - 756
[33] SPEAKER DIARISATION AND LONGITUDINAL LINKING IN MULTI-GENRE BROADCAST DATA
Karanasou, P.
Gales, M. J. F.
Lanchantin, P.
Liu, X.
Qian, Y.
Wang, L.
Woodland, P. C.
Zhang, C.
[J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 660 - 666
[34] JOINT SPEAKER DIARISATION AND TRACKING IN SWITCHING STATE-SPACE MODEL
Wong, Jeremy H. M.
Gong, Yifan
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 605 - 612
[35] AUDIO ENHANCING WITH DNN AUTOENCODER FOR SPEAKER RECOGNITION
Plchot, Oldrich
Burget, Lukas
Aronowitz, Hagai
Matejka, Pavel
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5090 - 5094
[36] Sparse DNN-based speaker segmentation using side information
Ma, Yong
Bao, Chang-Chun
[J]. ELECTRONICS LETTERS, 2015, 51 (08) : 651 - 653
[37] Usage of DNN in Speaker Recognition: Advantages and Problems
Kudashev, Oleg
Novoselov, Sergey
Pekhovsky, Timur
Simonchik, Konstantin
Lavrentyeva, Galina
[J]. ADVANCES IN NEURAL NETWORKS - ISNN 2016, 2016, 9719 : 82 - 91
[38] An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
Hojo, Nobukatsu
Ijima, Yusuke
Mizuno, Hideyuki
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2278 - 2282
[39] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Udagawa, Kenta
Saito, Yuki
Saruwatari, Hiroshi
[J]. INTERSPEECH 2022, 2022, : 2968 - 2972
[40] AN INVESTIGATION OF AUGMENTING SPEAKER REPRESENTATIONS TO IMPROVE SPEAKER NORMALISATION FOR DNN-BASED SPEECH RECOGNITION
Huang, Hengguan
Sim, Khe Chai
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4610 - 4613

← 1 2 3 4 5 →