Bangladeshi Bangla speech corpus for automatic speech recognition research

被引：7

作者：

Kibria, Shafkat ^{[1
]}

Samin, Ahnaf Mozib ^{[1
]}

Kobir, M. Humayon ^{[1
]}

Rahman, M. Shahidur ^{[1
]}

Selim, M. Reza ^{[1
]}

Iqbal, M. Zafar ^{[1
]}

机构：

[1] Shahjalal Univ Sci & Technol, Dept Comp Sci & Engn, Sylhet 3114, Bangladesh

来源：

SPEECH COMMUNICATION | 2022年 / 136卷

关键词：

Bangladeshi bangla corpus; Automatic speech recognition; Corpora evaluation; Recurrent neural network;

D O I：

10.1016/j.specom.2021.12.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This article reports the development of language resource for Bangladeshi Bangla spoken language (BBSL). Bangladeshi Bangla has inadequate large speech corpora for Large Vocabulary Continuous Speech Recognition (LVCSR) system. The accuracy of the automatic speech recognition (ASR) system rests on the quality of the speech corpus. This work discusses the common issues and activities related to the development of a large speech corpus named (sic) (SUBAK.KO). This corpus is designed to support ASR research in Bangladeshi Bangla. It has been labeled sentence-wise. We have trained this corpus with one of the well-known current End-to-End ASR algorithms, Recurrent Neural Networks (RNNs) with Connectionist Temporal Classification (CTC). To know the strengths and weaknesses, the CER (Character Error Rate) and the WER (Word Error Rate) of the trained RNN-CTC model have been observed. Another open-source large Bangla ASR corpus has been trained using the same ASR algorithm. Both trained models have been compared to assess the quality of these corpora. It has been found that SUBAK.KO is a more balanced corpus and considered more regional accented speech variability for a LVCSR system compared to that open-source large Bangla ASR corpus.

引用

页码：84 / 97

页数：14

共 50 条

[21] Using Automatic Speech Recognition in Spoken Corpus Curation
Gorisch, Jan
Gref, Michael
Schmidt, Thomas
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6423 - 6428
[22] BANGLA ISOLATED WORD SPEECH RECOGNITION
Firoze, Adnan
Arifin, M. Shamsul
Quadir, Ryana
Rahman, Rashedur M.
ICEIS 2011: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 2, 2011, : 73 - 82
[23] Bangla Speech Recognition for Voice Search
Saurav, Jillur Rahman
Amin, Shakhawat
Kibria, Shafkat
Rahman, M. Shahidur
2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
[24] PaSCoNT - Parallel Speech Corpus of Northern-central Thai for automatic speech recognition
Taerungruang, Supawat
Taninpong, Phimphaka
Chunwijitra, Vataya
Thatphithakkul, Sumonmas
Kasuriya, Sawit
Inthanon, Viroj
Paksaranuwat, Pawat
Thumronglaohapun, Salinee
Nakharutai, Nawapon
Inkeaw, Papangkorn
Bootkrajang, Jakramate
COMPUTER SPEECH AND LANGUAGE, 2025, 89
[25] Speech corpus recycling for acoustic cross-domain environments for automatic speech recognition
Ichikawa, Osamu
Rennie, Steven J.
Fukuda, Takashi
Willett, Daniel
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2016, 37 (02) : 55 - 65
[26] RODIGITS - A ROMANIAN CONNECTED-DIGITS SPEECH CORPUS FOR AUTOMATIC SPEECH AND SPEAKER RECOGNITION
Georgescu, Alexandru Lucian
Caranica, Alexandru
Cucu, Horia
Burileanu, Corneliu
UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2018, 80 (03): : 45 - 62
[27] Automatic speech activity detection, source localization, and speech recognition on the CHIL seminar corpus
Macho, D
Padrell, J
Abad, A
Nadeu, C
Hernando, J
McDonough, J
Wölfel, M
Klee, W
Omologo, M
Brutti, A
Svaizer, P
Potamianos, G
Chu, SM
2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 877 - 880
[28] ALGERIAN ARABIC SPEECH DATABASE (ALGASD): CORPUS DESIGN AND AUTOMATIC SPEECH RECOGNITION APPLICATION
Droua-Hamdani, Ghania
Selouani, Sid Ahmed
Boudraa, Malika
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2010, 35 (2C): : 157 - 166
[29] JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research
Itou, Katunobu
Yamamoto, Mikio
Takeda, Kazuya
Takezawa, Toshiyuki
Matsuoka, Tatsuo
Kobayashi, Tetsunori
Shikano, Kiyohiro
Itahashi, Shuichi
Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1999, 20 (03): : 199 - 206
[30] Corpus Construction for Deaf Speakers and Analysis by Automatic Speech Recognition
Kobayashi, Akio
Yasu, Keiichi
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2294 - 2298

← 1 2 3 4 5 →