A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

被引：0

作者：

Khassanov, Yerbolat ^{[1
]}

Mussakhojayeva, Saida ^{[1
]}

Mirzakhmetov, Almas ^{[1
]}

Adiyev, Alen ^{[1
]}

Nurpeiissov, Mukhamet ^{[1
]}

Varol, Huseyin Atakan ^{[1
]}

机构：

[1] Nazarbayev Univ, Inst Smart Syst & Artificial Intelligence ISSAI, Nur Sultan, Kazakhstan

来源：

16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021) | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present an open-source speech corpus for the Kazakh language. The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as well as both genders. It was carefully inspected by native Kazakh speakers to ensure high quality. The KSC is the largest publicly available database developed to advance various Kazakh speech and language processing applications. In this paper, we first describe the data collection and preprocessing procedures followed by a description of the database specifications. We also share our experience and challenges faced during the database construction, which might benefit other researchers planning to build a speech corpus for a low-resource language. To demonstrate the reliability of the database, we performed preliminary speech recognition experiments. The experimental results imply that the quality of audio and transcripts is promising (2.8% character error rate and 8.7% word error rate on the test set). To enable experiment reproducibility and ease the corpus usage, we also released an ESPnet recipe for our speech recognition models.

引用

页码：697 / 706

页数：10

共 50 条

[41] An Open Source Emotional Speech Corpus for Human Robot Interaction Applications
James, Jesin
Tian, Li
Watson, Catherine Inez
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2768 - 2772
[42] An Open-Source Speech Recognizer for Brazilian Portuguese with a Windows Programming Interface
Silva, Patrick
Batista, Pedro
Neto, Nelson
Klautau, Aldebaro
[J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2010, 6001 : 128 - 131
[43] DEVELOPMENT OF NEW SPEECH CORPUS FOR ELDERLY JAPANESE SPEECH RECOGNITION
Iribe, Yurie
Kitaoka, Norihide
Segawa, Shuhei
[J]. 2015 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2015 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2015, : 27 - 31
[44] Autoscore: An open-source automated tool for scoring listener perception of speech
Borrie, Stephanie A.
Barrett, Tyson S.
Yoho, Sarah E.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (01): : 392 - 399
[45] HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Chen, Chen
Hu, Yuchen
Yang, Chao-Han Huck
Siniscalchi, Sabato Marco
Chen, Pin-Yu
Chng, Eng Siong
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[46] RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition
Georgescu, Alexandru-Lucian
Cucu, Horia
Buzo, Andi
Burileanu, Corneliu
[J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6606 - 6612
[47] Bangladeshi Bangla speech corpus for automatic speech recognition research
Kibria, Shafkat
Samin, Ahnaf Mozib
Kobir, M. Humayon
Rahman, M. Shahidur
Selim, M. Reza
Iqbal, M. Zafar
[J]. SPEECH COMMUNICATION, 2022, 136 : 84 - 97
[48] Chhattisgarhi speech corpus for research and development in automatic speech recognition
Londhe, Narendra D.
Kshirsagar, Ghanahshyam B.
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (02) : 193 - 210
[49] Open Domain Continuous Filipino Speech Recognition: Challenges and Baseline Experiments
Ang, Federico
Guevara, Rowena Cristina
Miyanaga, Yoshikazu
Cajote, Rhandley
Ilao, Joel
Bayona, Michael Gringo Angelo
Laguna, Ann Franchesca
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2443 - 2452
[50] KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
Bang, Jeong-Uk
Yun, Seung
Kim, Seung-Hi
Choi, Mu-Yeol
Lee, Min-Kyu
Kim, Yeo-Jeong
Kim, Dong-Hyun
Park, Jun
Lee, Young-Jik
Kim, Sang-Hun
[J]. APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 17

← 1 2 3 4 5 →