A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

被引:0
|
作者
Khassanov, Yerbolat [1 ]
Mussakhojayeva, Saida [1 ]
Mirzakhmetov, Almas [1 ]
Adiyev, Alen [1 ]
Nurpeiissov, Mukhamet [1 ]
Varol, Huseyin Atakan [1 ]
机构
[1] Nazarbayev Univ, Inst Smart Syst & Artificial Intelligence ISSAI, Nur Sultan, Kazakhstan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an open-source speech corpus for the Kazakh language. The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as well as both genders. It was carefully inspected by native Kazakh speakers to ensure high quality. The KSC is the largest publicly available database developed to advance various Kazakh speech and language processing applications. In this paper, we first describe the data collection and preprocessing procedures followed by a description of the database specifications. We also share our experience and challenges faced during the database construction, which might benefit other researchers planning to build a speech corpus for a low-resource language. To demonstrate the reliability of the database, we performed preliminary speech recognition experiments. The experimental results imply that the quality of audio and transcripts is promising (2.8% character error rate and 8.7% word error rate on the test set). To enable experiment reproducibility and ease the corpus usage, we also released an ESPnet recipe for our speech recognition models.
引用
收藏
页码:697 / 706
页数:10
相关论文
共 50 条
  • [41] An Open Source Emotional Speech Corpus for Human Robot Interaction Applications
    James, Jesin
    Tian, Li
    Watson, Catherine Inez
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2768 - 2772
  • [42] An Open-Source Speech Recognizer for Brazilian Portuguese with a Windows Programming Interface
    Silva, Patrick
    Batista, Pedro
    Neto, Nelson
    Klautau, Aldebaro
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2010, 6001 : 128 - 131
  • [43] DEVELOPMENT OF NEW SPEECH CORPUS FOR ELDERLY JAPANESE SPEECH RECOGNITION
    Iribe, Yurie
    Kitaoka, Norihide
    Segawa, Shuhei
    [J]. 2015 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2015 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2015, : 27 - 31
  • [44] Autoscore: An open-source automated tool for scoring listener perception of speech
    Borrie, Stephanie A.
    Barrett, Tyson S.
    Yoho, Sarah E.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (01): : 392 - 399
  • [45] HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
    Chen, Chen
    Hu, Yuchen
    Yang, Chao-Han Huck
    Siniscalchi, Sabato Marco
    Chen, Pin-Yu
    Chng, Eng Siong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition
    Georgescu, Alexandru-Lucian
    Cucu, Horia
    Buzo, Andi
    Burileanu, Corneliu
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6606 - 6612
  • [47] Bangladeshi Bangla speech corpus for automatic speech recognition research
    Kibria, Shafkat
    Samin, Ahnaf Mozib
    Kobir, M. Humayon
    Rahman, M. Shahidur
    Selim, M. Reza
    Iqbal, M. Zafar
    [J]. SPEECH COMMUNICATION, 2022, 136 : 84 - 97
  • [48] Chhattisgarhi speech corpus for research and development in automatic speech recognition
    Londhe, Narendra D.
    Kshirsagar, Ghanahshyam B.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (02) : 193 - 210
  • [49] Open Domain Continuous Filipino Speech Recognition: Challenges and Baseline Experiments
    Ang, Federico
    Guevara, Rowena Cristina
    Miyanaga, Yoshikazu
    Cajote, Rhandley
    Ilao, Joel
    Bayona, Michael Gringo Angelo
    Laguna, Ann Franchesca
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2443 - 2452
  • [50] KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
    Bang, Jeong-Uk
    Yun, Seung
    Kim, Seung-Hi
    Choi, Mu-Yeol
    Lee, Min-Kyu
    Kim, Yeo-Jeong
    Kim, Dong-Hyun
    Park, Jun
    Lee, Young-Jik
    Kim, Sang-Hun
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 17