DNN-based speaker clustering for speaker diarisation

被引：13

作者：

Milner, Rosanna ^{[1
]}

Hain, Thomas ^{[1
]}

机构：

[1] Univ Sheffield, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

基金：

英国工程与自然科学研究理事会;

关键词：

speaker diarisation; speaker separation; deep neural network; DIARIZATION;

D O I：

10.21437/Interspeech.2016-126

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker diarisation, the task of answering "who spoke when?", is often considered to consist of three independent stages: speech activity detection, speaker segmentation and speaker clustering. These represent the separation of speech and non speech, the splitting into speaker homogeneous speech segments, followed by grouping together those which belong to the same speaker. This paper is concerned with speaker clustering, which is typically performed by bottom-up clustering using the Bayesian information criterion (BIC). We present a novel semi-supervised method of speaker clustering based on a deep neural network (DNN) model. A speaker separation DNN trained on independent data is used to iteratively relabel the test data set. This is achieved by reconfiguration of the output layer, combined with fine tuning in each iteration. A stopping criterion involving posteriors as confidence scores is investigated. Results are shown on a meeting task (RT07) for single distant microphones and compared with standard diarisation approaches. The new method achieves a diarisation error rate (DER) of 14.8%, compared to a baseline of 19.9%.

引用

页码：2185 / 2189

页数：5

共 50 条

[1] DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS
Milner, Rosanna
Hain, Thomas
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4925 - 4929
[2] DISCRIMINATIVE NEURAL CLUSTERING FOR SPEAKER DIARISATION
Li, Qiujia
Kreyssig, Florian L.
Zhang, Chao
Woodland, Philip C.
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 574 - 581
[3] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
[4] SPEAKER AND LANGUAGE FACTORIZATION IN DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5540 - 5544
[5] UNSUPERVISED SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5135 - 5139
[6] A DNN-based emotional speech synthesis by speaker adaptation
Yang, Hongwu
Zhang, Weizhao
Zhi, Pengpeng
[J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637
[7] On the Issue of Calibration in DNN-based Speaker Recognition Systems
McLaren, Mitchell
Castan, Diego
Ferrer, Luciana
Lawson, Aaron
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1825 - 1829
[8] DNN-based Models for Speaker Age and Gender Classification
Qawaqneh, Zakariya
Abu Mallouh, Arafat
Barkana, Buket D.
[J]. PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS, 2017, : 106 - 111
[9] DNN-Based Speech Synthesis Using Speaker Codes
Hojo, Nobukatsu
Ijima, Yusuke
Mizuno, Hideyuki
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
[10] A study of speaker adaptation for DNN-based speech synthesis
Wu, Zhizheng
Swietojanski, Pawel
Veaux, Christophe
Renals, Steve
King, Simon
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 879 - 883

← 1 2 3 4 5 →