VoxCeleb: a large-scale speaker identification dataset

被引：1062

作者：

Nagrani, Arsha ^{[1
]}

Chung, Joon Son ^{[1
]}

Zisserman, Andrew ^{[1
]}

机构：

[1] Univ Oxford, Dept Engn Sci, Visual Geometry Grp, Oxford, England

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

基金：

英国工程与自然科学研究理事会;

关键词：

speaker identification; speaker verification; large-scale; dataset; convolutional neural network;

D O I：

10.21437/Interspeech.2017-950

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and are usually hand-annotated, hence limited in size. The goal of this paper is to generate a large scale text-independent speaker identification dataset collected 'in the wild'. We make two contributions. First, we propose a fully automated pipeline based on computer vision techniques to create the dataset from open-source media. Our pipeline involves obtaining videos from YouTube; performing active speaker verification using a two-stream synchronization Convolutional Neural Network (CNN), and confirming the identity of the speaker using CNN based facial recognition. We use this pipeline to curate VoxCeleb which contains hundreds of thousands of 'real world' utterances for over 1,000 celebrities. Our second contribution is to apply and compare various state of the art speaker identification techniques on our dataset to establish baseline performance. We show that a CNN based architecture obtains the best performance for both identification and verification.

引用

页码：2616 / 2620

页数：5

共 50 条

[1] Voxceleb: Large-scale speaker verification in the wild
Nagrani, Arsha
Chung, Joon Son
Xie, Weidi
Zisserman, Andrew
[J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60 (60):
[2] LARGE-SCALE SPEAKER IDENTIFICATION
Schmidt, Ludwig
Sharifi, Matthew
Moreno, Ignacio Lopez
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[3] A Large-Scale Mobile Traffic Dataset For Mobile Application Identification
Zhao, Shuang
Chen, Shuhui
Wang, Fei
Wei, Ziling
Zhong, Jincheng
Liang, Jianbing
[J]. COMPUTER JOURNAL, 2024, 67 (04): : 1501 - 1513
[4] MoRe: A Large-Scale Motorcycle Re-Identification Dataset
Figueiredo, Augusto
Brayan, Johnata
Reis, Renan Oliveira
Prates, Raphael
Schwartz, William Robson
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 4033 - 4042
[5] Introduction and Analysis of a Large-Scale Benchmark Automatic Vehicle Identification Dataset
He, Zhaocheng
Chen, Kaiying
Chen, Xinyu
Sun, Weiwei
[J]. INTERNATIONAL CONFERENCE ON TRANSPORTATION AND DEVELOPMENT 2018: CONNECTED AND AUTONOMOUS VEHICLES AND TRANSPORTATION SAFETY, 2018, : 35 - 43
[6] DMDD: A Large-Scale Dataset for Dataset Mentions Detection
Pan, Huitong
Zhang, Qi
Dragut, Eduard
Caragea, Cornelia
Latecki, Longin Jan
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1132 - 1146
[7] Large-scale RDF Dataset Slicing
Marx, Edgard
Shekarpour, Saeedeh
Auer, Soeren
Ngomo, Axel-Cyrille Ngonga
[J]. 2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 228 - 235
[8] Euler Clustering on Large-scale Dataset
Wu, Jian-Sheng
Zheng, Wei-Shi
Lai, Jian-Huang
Suen, Ching Y.
[J]. IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (04) : 502 - 515
[9] Large-scale Speaker Retrieval on Random Speaker Variability Subspace
Shon, Suwon
Lee, Younggun
Kim, Taesu
[J]. INTERSPEECH 2019, 2019, : 2963 - 2967
[10] A resource-constrained HCRF modeling for a large-scale speaker identification task
Hong, Wei-Tyng
[J]. 2016 IEEE 5TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS, 2016,

← 1 2 3 4 5 →