VoxCeleb: a large-scale speaker identification dataset

被引:1062
|
作者
Nagrani, Arsha [1 ]
Chung, Joon Son [1 ]
Zisserman, Andrew [1 ]
机构
[1] Univ Oxford, Dept Engn Sci, Visual Geometry Grp, Oxford, England
基金
英国工程与自然科学研究理事会;
关键词
speaker identification; speaker verification; large-scale; dataset; convolutional neural network;
D O I
10.21437/Interspeech.2017-950
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and are usually hand-annotated, hence limited in size. The goal of this paper is to generate a large scale text-independent speaker identification dataset collected 'in the wild'. We make two contributions. First, we propose a fully automated pipeline based on computer vision techniques to create the dataset from open-source media. Our pipeline involves obtaining videos from YouTube; performing active speaker verification using a two-stream synchronization Convolutional Neural Network (CNN), and confirming the identity of the speaker using CNN based facial recognition. We use this pipeline to curate VoxCeleb which contains hundreds of thousands of 'real world' utterances for over 1,000 celebrities. Our second contribution is to apply and compare various state of the art speaker identification techniques on our dataset to establish baseline performance. We show that a CNN based architecture obtains the best performance for both identification and verification.
引用
收藏
页码:2616 / 2620
页数:5
相关论文
共 50 条
  • [1] Voxceleb: Large-scale speaker verification in the wild
    Nagrani, Arsha
    Chung, Joon Son
    Xie, Weidi
    Zisserman, Andrew
    [J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60 (60):
  • [2] LARGE-SCALE SPEAKER IDENTIFICATION
    Schmidt, Ludwig
    Sharifi, Matthew
    Moreno, Ignacio Lopez
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] A Large-Scale Mobile Traffic Dataset For Mobile Application Identification
    Zhao, Shuang
    Chen, Shuhui
    Wang, Fei
    Wei, Ziling
    Zhong, Jincheng
    Liang, Jianbing
    [J]. COMPUTER JOURNAL, 2024, 67 (04): : 1501 - 1513
  • [4] MoRe: A Large-Scale Motorcycle Re-Identification Dataset
    Figueiredo, Augusto
    Brayan, Johnata
    Reis, Renan Oliveira
    Prates, Raphael
    Schwartz, William Robson
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 4033 - 4042
  • [5] Introduction and Analysis of a Large-Scale Benchmark Automatic Vehicle Identification Dataset
    He, Zhaocheng
    Chen, Kaiying
    Chen, Xinyu
    Sun, Weiwei
    [J]. INTERNATIONAL CONFERENCE ON TRANSPORTATION AND DEVELOPMENT 2018: CONNECTED AND AUTONOMOUS VEHICLES AND TRANSPORTATION SAFETY, 2018, : 35 - 43
  • [6] DMDD: A Large-Scale Dataset for Dataset Mentions Detection
    Pan, Huitong
    Zhang, Qi
    Dragut, Eduard
    Caragea, Cornelia
    Latecki, Longin Jan
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1132 - 1146
  • [7] Large-scale RDF Dataset Slicing
    Marx, Edgard
    Shekarpour, Saeedeh
    Auer, Soeren
    Ngomo, Axel-Cyrille Ngonga
    [J]. 2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 228 - 235
  • [8] Euler Clustering on Large-scale Dataset
    Wu, Jian-Sheng
    Zheng, Wei-Shi
    Lai, Jian-Huang
    Suen, Ching Y.
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (04) : 502 - 515
  • [9] Large-scale Speaker Retrieval on Random Speaker Variability Subspace
    Shon, Suwon
    Lee, Younggun
    Kim, Taesu
    [J]. INTERSPEECH 2019, 2019, : 2963 - 2967
  • [10] A resource-constrained HCRF modeling for a large-scale speaker identification task
    Hong, Wei-Tyng
    [J]. 2016 IEEE 5TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS, 2016,