Creation of Marathi Speech Corpus for Automatic Speech Recognition

被引：0

作者：

Gaikwad, Santosh ^{[1
]}

Gawali, Bharti ^{[1
]}

Mehrotra, Suresh ^{[1
]}

机构：

[1] Dr Babasaheb Ambedkar Marathwada Univ, Dept Comp Sci & Informat Technol, Aurangabad 431004, Maharashtra, India

来源：

2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE) | 2013年

关键词：

Audio; Corpus; CMU; Labeling; Annotation; Speakerm; Gender; Communication; Praat;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes the collection of audio corpus for Marathi language. Marathi is one of the regional Indian languages. There are 12 vowels and 36 consonants present in Marathi languages. The objective of the research is to create the speech corpus which can be used for automatic speech recognition system for various domains like telephonic inquiry system, teaching tutor etc. The size of corpus collected is 28420 isolated words and 17470 sentences from around 500 speakers. The speech utterances were recorded in 16 kHz in three recording medium, a headset, desktop mounted microphone and Mobile phone. The corpus is transcripted as well as annotated and is available for recognition system.

引用

页数：5

共 50 条

[1] Corpus for automatic speech recognition
Adda-Decker, Martine
[J]. REVUE FRANCAISE DE LINGUISTIQUE APPLIQUEE, 2007, 12 (01): : 71 - 84
[2] The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
Mukiibi, Jonathan
Katumba, Andrew
Nakatumba-Nabende, Joyce
Hussein, Ali
Meyer, Josh
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1945 - 1954
[3] Chhattisgarhi speech corpus for research and development in automatic speech recognition
Londhe, Narendra D.
Kshirsagar, Ghanahshyam B.
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (02) : 193 - 210
[4] Bangladeshi Bangla speech corpus for automatic speech recognition research
Kibria, Shafkat
Samin, Ahnaf Mozib
Kobir, M. Humayon
Rahman, M. Shahidur
Selim, M. Reza
Iqbal, M. Zafar
[J]. Speech Communication, 2022, 136 : 84 - 97
[5] RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition
Georgescu, Alexandru-Lucian
Cucu, Horia
Buzo, Andi
Burileanu, Corneliu
[J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6606 - 6612
[6] Bangladeshi Bangla speech corpus for automatic speech recognition research
Kibria, Shafkat
Samin, Ahnaf Mozib
Kobir, M. Humayon
Rahman, M. Shahidur
Selim, M. Reza
Iqbal, M. Zafar
[J]. SPEECH COMMUNICATION, 2022, 136 : 84 - 97
[7] An automatic speech recognition system for spontaneous Punjabi speech corpus
Kumar Y.
Singh N.
[J]. International Journal of Speech Technology, 2017, 20 (2) : 297 - 303
[8] KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
Bang, Jeong-Uk
Yun, Seung
Kim, Seung-Hi
Choi, Mu-Yeol
Lee, Min-Kyu
Kim, Yeo-Jeong
Kim, Dong-Hyun
Park, Jun
Lee, Young-Jik
Kim, Sang-Hun
[J]. APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 17
[9] Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights
Adiga, Devaraja
Kumar, Rishabh
Krishna, Amrith
Jyothi, Preethi
Ramakrishnan, Ganesh
Goyal, Pawan
[J]. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, : 5039 - 5050
[10] CEASR: A Corpus for Evaluating Automatic Speech Recognition
Ulasik, Malgorzata Anna
Huerlimann, Manuela
Germann, Fabian
Gedik, Esin
Benites, Fernando
Cieliebak, Mark
[J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6477 - 6485

← 1 2 3 4 5 →