DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS

被引:0
|
作者
Milner, Rosanna [1 ]
Hain, Thomas [1 ]
机构
[1] Univ Sheffield, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England
基金
英国工程与自然科学研究理事会;
关键词
speaker diarisation; multi-channel; crosstalk; deep neural networks; speaker channels; DIARIZATION; SPEECH;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarisation addresses the question of "who speaks when" in audio recordings, and has been studied extensively in the context of tasks such as broadcast news, meetings, etc. Performing diarisation on individual headset microphone (IHM) channels is sometimes assumed to easily give the desired output of speaker labelled segments with timing information. However, it is shown that given imperfect data, such as speaker channels with heavy cross talk and overlapping speech, this is not the case. Deep neural networks (DNNs) can be trained on features derived from the concatenation of speaker channel features to detect which is the correct channel for each frame. Crosstalk features can be calculated and DNNs trained with or without overlapping speech to combat problematic data. A simple frame decision metric of counting occurrences is investigated as weIl as adding a bias against selecting nonspeech for a frame. Finally, two different scoring setups are applied to both datasets. The stricter SHEF setup finds diarisation error rates (DER) of 9.2% on TBL and 23.2% on RT07 while the NIST setup achieves 5.7% and 15.1% respectively.
引用
收藏
页码:4925 / 4929
页数:5
相关论文
共 50 条
  • [31] Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription
    Zheng, Xianrui
    Zhang, Chao
    Woodland, Phil C.
    [J]. INTERSPEECH 2022, 2022, : 3844 - 3848
  • [32] Two-way cluster voting to improve speaker diarisation performance
    Tranter, SE
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 753 - 756
  • [33] SPEAKER DIARISATION AND LONGITUDINAL LINKING IN MULTI-GENRE BROADCAST DATA
    Karanasou, P.
    Gales, M. J. F.
    Lanchantin, P.
    Liu, X.
    Qian, Y.
    Wang, L.
    Woodland, P. C.
    Zhang, C.
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 660 - 666
  • [34] JOINT SPEAKER DIARISATION AND TRACKING IN SWITCHING STATE-SPACE MODEL
    Wong, Jeremy H. M.
    Gong, Yifan
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 605 - 612
  • [35] AUDIO ENHANCING WITH DNN AUTOENCODER FOR SPEAKER RECOGNITION
    Plchot, Oldrich
    Burget, Lukas
    Aronowitz, Hagai
    Matejka, Pavel
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5090 - 5094
  • [36] Sparse DNN-based speaker segmentation using side information
    Ma, Yong
    Bao, Chang-Chun
    [J]. ELECTRONICS LETTERS, 2015, 51 (08) : 651 - 653
  • [37] Usage of DNN in Speaker Recognition: Advantages and Problems
    Kudashev, Oleg
    Novoselov, Sergey
    Pekhovsky, Timur
    Simonchik, Konstantin
    Lavrentyeva, Galina
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2016, 2016, 9719 : 82 - 91
  • [38] An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2278 - 2282
  • [39] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
    Udagawa, Kenta
    Saito, Yuki
    Saruwatari, Hiroshi
    [J]. INTERSPEECH 2022, 2022, : 2968 - 2972
  • [40] AN INVESTIGATION OF AUGMENTING SPEAKER REPRESENTATIONS TO IMPROVE SPEAKER NORMALISATION FOR DNN-BASED SPEECH RECOGNITION
    Huang, Hengguan
    Sim, Khe Chai
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4610 - 4613