Spot the conversation: speaker diarisation in the wild

被引：27

作者：

Chung, Joon Son ^{[1
,2
]}

Huh, Jaesung ^{[1
,2
]}

Nagrani, Arsha ^{[1
]}

Afouras, Triantafyllos ^{[1
]}

Zisserman, Andrew ^{[1
]}

机构：

[1] Univ Oxford, Dept Engn Sci, Visual Geometry Grp, Oxford, England

[2] Naver Corp, Seongnam Si, Gyeonggi Provin, South Korea

来源：

INTERSPEECH 2020 | 2020年

基金：

英国工程与自然科学研究理事会;

关键词：

speaker diarisation; speaker recognition; DIARIZATION;

D O I：

10.21437/Interspeech.2020-2337

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

The goal of this paper is speaker diarisation of videos collected 'in the wild'. We make three key contributions. First, we propose an automatic audio-visual diarisation method for YouTube videos. Our method consists of active speaker detection using audio-visual methods and speaker verification using self-enrolled speaker models. Second, we integrate our method into a semi-automatic dataset creation pipeline which significantly reduces the number of hours required to annotate videos with diarisation labels. Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from `in the wild' videos, which we will release publicly to the research community. Our dataset consists of overlapping speech, a large and diverse speaker pool, and challenging background conditions.

引用

页码：299 / 303

页数：5

共 50 条

[1] Adapting Speaker Embeddings for Speaker Diarisation
Kwon, Youngki
Jung, Jee-weon
Heo, Hee-Soo
Kim, You Jin
Lee, Bong-Jin
Chung, Joon Son
[J]. INTERSPEECH 2021, 2021, : 3101 - 3105
[2] CONTENT-AWARE SPEAKER EMBEDDINGS FOR SPEAKER DIARISATION
Sun, G.
Liu, D.
Zhang, C.
Woodland, P. C.
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7168 - 7172
[3] Combination of deep speaker embeddings for diarisation
Sun, Guangzhi
Zhang, Chao
Woodland, Philip C.
[J]. NEURAL NETWORKS, 2021, 141 : 372 - 384
[4] DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS
Milner, Rosanna
Hain, Thomas
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4925 - 4929
[5] Speaker overlap detection with prosodic features for speaker diarisation
Zelenak, M.
Hernando, J.
[J]. IET SIGNAL PROCESSING, 2012, 6 (08) : 798 - 804
[6] DNN-based speaker clustering for speaker diarisation
Milner, Rosanna
Hain, Thomas
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2185 - 2189
[7] DISCRIMINATIVE NEURAL CLUSTERING FOR SPEAKER DIARISATION
Li, Qiujia
Kreyssig, Florian L.
Zhang, Chao
Woodland, Philip C.
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 574 - 581
[8] Strategies to Improve a Speaker Diarisation Tool
Tavarez, David
Navas, Eva
Erro, Daniel
Saratxaga, Ibon
[J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 4117 - 4121
[9] Audio-Visual Synchronisation for Speaker Diarisation
Garau, Giulia
Dielmann, Alfred
Bourlard, Herve
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2662 - +
[10] Redefining the Bayesian Information Criterion for Speaker Diarisation
Stafylakis, Themos
Katsouros, Vassilis
Carayannis, George
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1055 - 1058

← 1 2 3 4 5 →