Bangladeshi Bangla speech corpus for automatic speech recognition research

被引：7

作者：

Kibria, Shafkat ^{[1
]}

Samin, Ahnaf Mozib ^{[1
]}

Kobir, M. Humayon ^{[1
]}

Rahman, M. Shahidur ^{[1
]}

Selim, M. Reza ^{[1
]}

Iqbal, M. Zafar ^{[1
]}

机构：

[1] Shahjalal Univ Sci & Technol, Dept Comp Sci & Engn, Sylhet 3114, Bangladesh

来源：

SPEECH COMMUNICATION | 2022年 / 136卷

关键词：

Bangladeshi bangla corpus; Automatic speech recognition; Corpora evaluation; Recurrent neural network;

D O I：

10.1016/j.specom.2021.12.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This article reports the development of language resource for Bangladeshi Bangla spoken language (BBSL). Bangladeshi Bangla has inadequate large speech corpora for Large Vocabulary Continuous Speech Recognition (LVCSR) system. The accuracy of the automatic speech recognition (ASR) system rests on the quality of the speech corpus. This work discusses the common issues and activities related to the development of a large speech corpus named (sic) (SUBAK.KO). This corpus is designed to support ASR research in Bangladeshi Bangla. It has been labeled sentence-wise. We have trained this corpus with one of the well-known current End-to-End ASR algorithms, Recurrent Neural Networks (RNNs) with Connectionist Temporal Classification (CTC). To know the strengths and weaknesses, the CER (Character Error Rate) and the WER (Word Error Rate) of the trained RNN-CTC model have been observed. Another open-source large Bangla ASR corpus has been trained using the same ASR algorithm. Both trained models have been compared to assess the quality of these corpora. It has been found that SUBAK.KO is a more balanced corpus and considered more regional accented speech variability for a LVCSR system compared to that open-source large Bangla ASR corpus.

引用

页码：84 / 97

页数：14

共 50 条

[31] MinSpeech: A Corpus of Southern Min Dialect for Automatic Speech Recognition
Lin, Jiayan
Lu, Shenghui
Huang, Hukai
Guan, Wenhao
Xu, Binbin
Bu, Hui
Hong, Qingyang
Li, Lin
INTERSPEECH 2024, 2024, : 2330 - 2334
[32] TED-LIUM: an Automatic Speech Recognition dedicated corpus
Rousseau, Anthony
Deleglise, Paul
Esteve, Yannick
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 125 - 129
[33] An audio-visual corpus for multimodal automatic speech recognition
Andrzej Czyzewski
Bozena Kostek
Piotr Bratoszewski
Jozef Kotus
Marcin Szykulski
Journal of Intelligent Information Systems, 2017, 49 : 167 - 192
[34] An audio-visual corpus for multimodal automatic speech recognition
Czyzewski, Andrzej
Kostek, Bozena
Bratoszewski, Piotr
Kotus, Jozef
Szykulski, Marcin
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (02) : 167 - 192
[35] A speech corpus of Quechua Collao for automatic dimensional emotion recognition
Paccotacya-Yanque, Rosa Y. G.
Huanca-Anquise, Candy A.
Escalante-Calcina, Judith
Ramos-Lovon, Wilber R.
Cuno-Parari, Alvaro E.
SCIENTIFIC DATA, 2022, 9 (01)
[36] A speech corpus of Quechua Collao for automatic dimensional emotion recognition
Rosa Y. G. Paccotacya-Yanque
Candy A. Huanca-Anquise
Judith Escalante-Calcina
Wilber R. Ramos-Lovón
Álvaro E. Cuno-Parari
Scientific Data, 9
[37] Speech production and automatic speech recognition
Acoustics Bulletin, 2000, 25 (02):
[38] Recent Advancement in Speech Recognition for Bangla: A Survey
Sultana, Sadia
Rahman, M. Shahidur
Iqbal, M. Zafar
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 546 - 552
[39] AUTOMATIC SPEECH RECOGNITION OF IMPAIRED SPEECH
CARLSON, GS
BERNSTEIN, J
INTERNATIONAL JOURNAL OF REHABILITATION RESEARCH, 1988, 11 (04) : 396 - 398
[40] Trends and developments in automatic speech recognition research
O'Shaughnessy, Douglas
COMPUTER SPEECH AND LANGUAGE, 2023, 83

← 1 2 3 4 5 →