DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS

被引:0
|
作者
Milner, Rosanna [1 ]
Hain, Thomas [1 ]
机构
[1] Univ Sheffield, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England
基金
英国工程与自然科学研究理事会;
关键词
speaker diarisation; multi-channel; crosstalk; deep neural networks; speaker channels; DIARIZATION; SPEECH;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarisation addresses the question of "who speaks when" in audio recordings, and has been studied extensively in the context of tasks such as broadcast news, meetings, etc. Performing diarisation on individual headset microphone (IHM) channels is sometimes assumed to easily give the desired output of speaker labelled segments with timing information. However, it is shown that given imperfect data, such as speaker channels with heavy cross talk and overlapping speech, this is not the case. Deep neural networks (DNNs) can be trained on features derived from the concatenation of speaker channel features to detect which is the correct channel for each frame. Crosstalk features can be calculated and DNNs trained with or without overlapping speech to combat problematic data. A simple frame decision metric of counting occurrences is investigated as weIl as adding a bias against selecting nonspeech for a frame. Finally, two different scoring setups are applied to both datasets. The stricter SHEF setup finds diarisation error rates (DER) of 9.2% on TBL and 23.2% on RT07 while the NIST setup achieves 5.7% and 15.1% respectively.
引用
收藏
页码:4925 / 4929
页数:5
相关论文
共 50 条
  • [1] DNN-based speaker clustering for speaker diarisation
    Milner, Rosanna
    Hain, Thomas
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2185 - 2189
  • [2] Adapting Speaker Embeddings for Speaker Diarisation
    Kwon, Youngki
    Jung, Jee-weon
    Heo, Hee-Soo
    Kim, You Jin
    Lee, Bong-Jin
    Chung, Joon Son
    [J]. INTERSPEECH 2021, 2021, : 3101 - 3105
  • [3] CONTENT-AWARE SPEAKER EMBEDDINGS FOR SPEAKER DIARISATION
    Sun, G.
    Liu, D.
    Zhang, C.
    Woodland, P. C.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7168 - 7172
  • [4] Speaker overlap detection with prosodic features for speaker diarisation
    Zelenak, M.
    Hernando, J.
    [J]. IET SIGNAL PROCESSING, 2012, 6 (08) : 798 - 804
  • [5] Combination of deep speaker embeddings for diarisation
    Sun, Guangzhi
    Zhang, Chao
    Woodland, Philip C.
    [J]. NEURAL NETWORKS, 2021, 141 : 372 - 384
  • [6] Speaker Adaptation Using Speaker Similarity Score on DNN Features
    Rizwan, Muhammad
    Anderson, David V.
    [J]. 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 877 - 882
  • [7] USING AUDIO AND VISUAL CUES FOR SPEAKER DIARISATION INITIALISATION
    Garau, Giulia
    Bourlard, Herve
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4942 - 4945
  • [8] DISCRIMINATIVE NEURAL CLUSTERING FOR SPEAKER DIARISATION
    Li, Qiujia
    Kreyssig, Florian L.
    Zhang, Chao
    Woodland, Philip C.
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 574 - 581
  • [9] Strategies to Improve a Speaker Diarisation Tool
    Tavarez, David
    Navas, Eva
    Erro, Daniel
    Saratxaga, Ibon
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 4117 - 4121
  • [10] Spot the conversation: speaker diarisation in the wild
    Chung, Joon Son
    Huh, Jaesung
    Nagrani, Arsha
    Afouras, Triantafyllos
    Zisserman, Andrew
    [J]. INTERSPEECH 2020, 2020, : 299 - 303