Common Voice: A Massively-Multilingual Speech Corpus

被引：0

作者：

Ardila, Rosana ^{[1
]}

Branson, Megan ^{[1
]}

Davis, Kelly ^{[1
]}

Henretty, Michael ^{[4
]}

Kohler, Michael ^{[4
]}

Meyer, Josh ^{[3
]}

Morais, Reuben ^{[1
]}

Saunders, Lindsay ^{[1
]}

Tyers, Francis M. ^{[2
]}

Weber, Gregor ^{[1
]}

机构：

[1] Mozilla, Bloomington, IN 47408 USA

[2] Indiana Univ, Bloomington, IN USA

[3] Artie Inc, Bloomington, IN USA

[4] Various Cities, Los Angeles, CA USA

来源：

PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年

基金：

美国国家科学基金会;

关键词：

spoken corpus; Automatic Speech Recognition; low-resource languages;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The Common Voice corpus is a massively-multilingual collection of transcribed speech intended for speech technology research and development. Common Voice is designed for Automatic Speech Recognition purposes but can be useful in other domains (e.g. language identification). To achieve scale and sustainability, the Common Voice project employs crowdsourcing for both data collection and data validation. The most recent release includes 29 languages, and as of November 2019 there are a total of 38 languages collecting data. Over 50,000 individuals have participated so far, resulting in 2,500 hours of collected audio. To our knowledge this is the largest audio corpus in the public domain for speech recognition, both in terms of number of hours and number of languages. As an example use case for Common Voice, we present speech recognition experiments using Mozilla's DeepSpeech Speech-to-Text toolkit. By applying transfer learning from a source English model, we find an average Character Error Rate improvement of 5.99 +/- 5.48 for twelve target languages (German, French, Italian, Turkish, Catalan, Slovenian, Welsh, Irish, Breton, Tatar, Chuvash, and Kabyle). For most of these languages, these are the first ever published results on end-to-end Automatic Speech Recognition.

引用

页码：4218 / 4222

页数：5

共 50 条

[1] CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
Jia, Ye
Ramanovich, Michelle Tadmor
Wang, Quan
Zen, Heiga
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6691 - 6703
[2] ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus
Imani, Ayyoob
Sabet, Masoud Jalili
Duller, Philipp
Cysouw, Michael
Schuetze, Hinrich
[J]. ACL-IJCNLP 2021: THE JOINT CONFERENCE OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, 2021, : 63 - 72
[3] Massively Multilingual Adversarial Speech Recognition
Adams, Oliver
Wiesner, Matthew
Watanabe, Shinji
Yarowsky, David
[J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 96 - 108
[4] A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus
P. Vijayalakshmi
B. Ramani
M. P. Actlin Jeeva
T. Nagarajan
[J]. Circuits, Systems, and Signal Processing, 2018, 37 : 2142 - 2163
[5] A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus
Vijayalakshmi, P.
Ramani, B.
Jeeva, M. P. Actlin
Nagarajan, T.
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (05) : 2142 - 2163
[6] CoVoST 2 and Massively Multilingual Speech Translation
Wang, Changhan
Wu, Anne
Gu, Jiatao
Pino, Juan
[J]. INTERSPEECH 2021, 2021, : 2247 - 2251
[7] Euronews: a multilingual speech corpus for ASR
Gretter, Roberto
[J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2635 - 2638
[8] Multilingual Speech Synthesis for Voice Cloning
Seong, Jiwon
Lee, WooKey
Lee, Suan
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 313 - 316
[9] PSEUDO-LABELING FOR MASSIVELY MULTILINGUAL SPEECH RECOGNITION
Lugosch, Loren
Likhomanenko, Tatiana
Synnaeve, Gabriel
Collobert, Ronan
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7687 - 7691
[10] The Multilingual TEDx Corpus for Speech Recognition and Translation
Salesky, Elizabeth
Wiesner, Matthew
Bremerman, Jacob
Cattoni, Roldano
Negri, Matteo
Turchi, Marco
Oard, Douglas W.
Post, Matt
[J]. INTERSPEECH 2021, 2021, : 3655 - 3659

← 1 2 3 4 5 →