Bangladeshi Bangla speech corpus for automatic speech recognition research

被引:7
|
作者
Kibria, Shafkat [1 ]
Samin, Ahnaf Mozib [1 ]
Kobir, M. Humayon [1 ]
Rahman, M. Shahidur [1 ]
Selim, M. Reza [1 ]
Iqbal, M. Zafar [1 ]
机构
[1] Shahjalal Univ Sci & Technol, Dept Comp Sci & Engn, Sylhet 3114, Bangladesh
关键词
Bangladeshi bangla corpus; Automatic speech recognition; Corpora evaluation; Recurrent neural network;
D O I
10.1016/j.specom.2021.12.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article reports the development of language resource for Bangladeshi Bangla spoken language (BBSL). Bangladeshi Bangla has inadequate large speech corpora for Large Vocabulary Continuous Speech Recognition (LVCSR) system. The accuracy of the automatic speech recognition (ASR) system rests on the quality of the speech corpus. This work discusses the common issues and activities related to the development of a large speech corpus named (sic) (SUBAK.KO). This corpus is designed to support ASR research in Bangladeshi Bangla. It has been labeled sentence-wise. We have trained this corpus with one of the well-known current End-to-End ASR algorithms, Recurrent Neural Networks (RNNs) with Connectionist Temporal Classification (CTC). To know the strengths and weaknesses, the CER (Character Error Rate) and the WER (Word Error Rate) of the trained RNN-CTC model have been observed. Another open-source large Bangla ASR corpus has been trained using the same ASR algorithm. Both trained models have been compared to assess the quality of these corpora. It has been found that SUBAK.KO is a more balanced corpus and considered more regional accented speech variability for a LVCSR system compared to that open-source large Bangla ASR corpus.
引用
收藏
页码:84 / 97
页数:14
相关论文
共 50 条
  • [21] Using Automatic Speech Recognition in Spoken Corpus Curation
    Gorisch, Jan
    Gref, Michael
    Schmidt, Thomas
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6423 - 6428
  • [22] BANGLA ISOLATED WORD SPEECH RECOGNITION
    Firoze, Adnan
    Arifin, M. Shamsul
    Quadir, Ryana
    Rahman, Rashedur M.
    ICEIS 2011: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 2, 2011, : 73 - 82
  • [23] Bangla Speech Recognition for Voice Search
    Saurav, Jillur Rahman
    Amin, Shakhawat
    Kibria, Shafkat
    Rahman, M. Shahidur
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [24] PaSCoNT - Parallel Speech Corpus of Northern-central Thai for automatic speech recognition
    Taerungruang, Supawat
    Taninpong, Phimphaka
    Chunwijitra, Vataya
    Thatphithakkul, Sumonmas
    Kasuriya, Sawit
    Inthanon, Viroj
    Paksaranuwat, Pawat
    Thumronglaohapun, Salinee
    Nakharutai, Nawapon
    Inkeaw, Papangkorn
    Bootkrajang, Jakramate
    COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [25] Speech corpus recycling for acoustic cross-domain environments for automatic speech recognition
    Ichikawa, Osamu
    Rennie, Steven J.
    Fukuda, Takashi
    Willett, Daniel
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2016, 37 (02) : 55 - 65
  • [26] RODIGITS - A ROMANIAN CONNECTED-DIGITS SPEECH CORPUS FOR AUTOMATIC SPEECH AND SPEAKER RECOGNITION
    Georgescu, Alexandru Lucian
    Caranica, Alexandru
    Cucu, Horia
    Burileanu, Corneliu
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2018, 80 (03): : 45 - 62
  • [27] Automatic speech activity detection, source localization, and speech recognition on the CHIL seminar corpus
    Macho, D
    Padrell, J
    Abad, A
    Nadeu, C
    Hernando, J
    McDonough, J
    Wölfel, M
    Klee, W
    Omologo, M
    Brutti, A
    Svaizer, P
    Potamianos, G
    Chu, SM
    2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 877 - 880
  • [28] ALGERIAN ARABIC SPEECH DATABASE (ALGASD): CORPUS DESIGN AND AUTOMATIC SPEECH RECOGNITION APPLICATION
    Droua-Hamdani, Ghania
    Selouani, Sid Ahmed
    Boudraa, Malika
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2010, 35 (2C): : 157 - 166
  • [29] JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research
    Itou, Katunobu
    Yamamoto, Mikio
    Takeda, Kazuya
    Takezawa, Toshiyuki
    Matsuoka, Tatsuo
    Kobayashi, Tetsunori
    Shikano, Kiyohiro
    Itahashi, Shuichi
    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1999, 20 (03): : 199 - 206
  • [30] Corpus Construction for Deaf Speakers and Analysis by Automatic Speech Recognition
    Kobayashi, Akio
    Yasu, Keiichi
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2294 - 2298