Creation of Marathi Speech Corpus for Automatic Speech Recognition

被引:0
|
作者
Gaikwad, Santosh [1 ]
Gawali, Bharti [1 ]
Mehrotra, Suresh [1 ]
机构
[1] Dr Babasaheb Ambedkar Marathwada Univ, Dept Comp Sci & Informat Technol, Aurangabad 431004, Maharashtra, India
关键词
Audio; Corpus; CMU; Labeling; Annotation; Speakerm; Gender; Communication; Praat;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the collection of audio corpus for Marathi language. Marathi is one of the regional Indian languages. There are 12 vowels and 36 consonants present in Marathi languages. The objective of the research is to create the speech corpus which can be used for automatic speech recognition system for various domains like telephonic inquiry system, teaching tutor etc. The size of corpus collected is 28420 isolated words and 17470 sentences from around 500 speakers. The speech utterances were recorded in 16 kHz in three recording medium, a headset, desktop mounted microphone and Mobile phone. The corpus is transcripted as well as annotated and is available for recognition system.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Corpus for automatic speech recognition
    Adda-Decker, Martine
    [J]. REVUE FRANCAISE DE LINGUISTIQUE APPLIQUEE, 2007, 12 (01): : 71 - 84
  • [2] The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
    Mukiibi, Jonathan
    Katumba, Andrew
    Nakatumba-Nabende, Joyce
    Hussein, Ali
    Meyer, Josh
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1945 - 1954
  • [3] Chhattisgarhi speech corpus for research and development in automatic speech recognition
    Londhe, Narendra D.
    Kshirsagar, Ghanahshyam B.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (02) : 193 - 210
  • [4] Bangladeshi Bangla speech corpus for automatic speech recognition research
    Kibria, Shafkat
    Samin, Ahnaf Mozib
    Kobir, M. Humayon
    Rahman, M. Shahidur
    Selim, M. Reza
    Iqbal, M. Zafar
    [J]. Speech Communication, 2022, 136 : 84 - 97
  • [5] RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition
    Georgescu, Alexandru-Lucian
    Cucu, Horia
    Buzo, Andi
    Burileanu, Corneliu
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6606 - 6612
  • [6] Bangladeshi Bangla speech corpus for automatic speech recognition research
    Kibria, Shafkat
    Samin, Ahnaf Mozib
    Kobir, M. Humayon
    Rahman, M. Shahidur
    Selim, M. Reza
    Iqbal, M. Zafar
    [J]. SPEECH COMMUNICATION, 2022, 136 : 84 - 97
  • [7] An automatic speech recognition system for spontaneous Punjabi speech corpus
    Kumar Y.
    Singh N.
    [J]. International Journal of Speech Technology, 2017, 20 (2) : 297 - 303
  • [8] KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
    Bang, Jeong-Uk
    Yun, Seung
    Kim, Seung-Hi
    Choi, Mu-Yeol
    Lee, Min-Kyu
    Kim, Yeo-Jeong
    Kim, Dong-Hyun
    Park, Jun
    Lee, Young-Jik
    Kim, Sang-Hun
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 17
  • [9] Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights
    Adiga, Devaraja
    Kumar, Rishabh
    Krishna, Amrith
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    Goyal, Pawan
    [J]. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, : 5039 - 5050
  • [10] CEASR: A Corpus for Evaluating Automatic Speech Recognition
    Ulasik, Malgorzata Anna
    Huerlimann, Manuela
    Germann, Fabian
    Gedik, Esin
    Benites, Fernando
    Cieliebak, Mark
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6477 - 6485