Deep learning based large vocabulary continuous speech recognition of an under-resourced language Bangladeshi Bangla

被引:4
|
作者
Samin, Ahnaf Mozib [1 ]
Kobir, M. Humayon [1 ]
Kibria, Shafkat [1 ]
Rahman, M. Shahidur [1 ]
机构
[1] Shahjalal Univ Sci & Technol, Dept Comp Sci & Engn, Sylhet 3114, Bangladesh
关键词
Large vocabulary continuous speech recognition; Convolutional neural network; Recurrent neural network; Language modeling; Bangladeshi Bangla; NEURAL-NETWORKS; MODELS;
D O I
10.1250/ast.42.252
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Research in corpus-driven Automatic Speech Recognition (ASR) is advancing rapidly towards building a robust Large Vocabulary Continuous Speech Recognition (LVCSR) system. Under-resourced languages like Bangla require benchmarking large corpora for more research on LVCSR to tackle their limitations and avoid the biased results. In this paper, a publicly published large-scale Bangladeshi Bangla speech corpus is used to implement deep Convolutional Neural Network (CNN) based model and Recurrent Neural Network (RNN) based model with Connectionist Temporal Classification (CTC) loss function for Bangla LVCSR. In experimental evaluations, we find that CNN-based architecture yields superior results over the RNN-based approach. This study also emphasizes assessing the quality of an open-source large-scale Bangladeshi Bangla speech corpus and investigating the effect of the various high-order N-gram Language Models (LM) on a morphologically rich language Bangla. We achieve 36.12% word error rate (WER) using CNN-based acoustic model and 13.93% WER using beam search decoding with 5-gram LM. The findings demonstrate by far the state-of-the-art performance of any Bangla LVCSR system on a specific benchmarked large corpus.
引用
收藏
页码:252 / 260
页数:9
相关论文
共 50 条
  • [1] Automatic Speech Recognition for an Under-Resourced Language - Amharic
    Abate, Solomon Teferra
    Menzel, Wolfgang
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 973 - 976
  • [2] Automatic Speech Recognition for an Under-Resourced Language - Amharic
    Abate, Solomon Teferra
    Menzel, Wolfgang
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1737 - 1740
  • [3] Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages
    Van Hai Do
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (02): : 285 - 295
  • [4] Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language
    Le, Viet-Bac
    Besacier, Laurent
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (08): : 1471 - 1482
  • [5] Modeling under-resourced languages for speech recognition
    Kurimo, Mikko
    Enarvi, Seppo
    Tilk, Ottokar
    Varjokallio, Matti
    Mansikkaniemi, Andre
    Alumae, Tanel
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2017, 51 (04) : 961 - 987
  • [6] Modeling under-resourced languages for speech recognition
    Mikko Kurimo
    Seppo Enarvi
    Ottokar Tilk
    Matti Varjokallio
    André Mansikkaniemi
    Tanel Alumäe
    [J]. Language Resources and Evaluation, 2017, 51 : 961 - 987
  • [7] Sentence boundary detection without speech recognition: A case of an under-resourced language
    Jamil, Nursuriati
    Ramli, Muhammad Izzad
    Seman, Noraini
    [J]. JOURNAL OF ELECTRICAL SYSTEMS, 2015, 11 (03) : 308 - 318
  • [8] Influences of Age in Emotion Recognition of Spontaneous Speech A Case of an Under-Resourced Language
    Jamil, Nursuriati
    Apandi, Farihah
    Hamzah, Raseeda
    [J]. 2017 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2017,
  • [9] Language Modeling for Speech Analytics in Under-Resourced Languages
    Wills, Simone
    Uys, Pieter
    van Heerden, Charl
    Barnard, Etienne
    [J]. INTERSPEECH 2020, 2020, : 4941 - 4945
  • [10] Automatic speech recognition for under-resourced languages: A survey
    Besacier, Laurent
    Barnard, Etienne
    Karpov, Alexey
    Schultz, Tanja
    [J]. SPEECH COMMUNICATION, 2014, 56 : 85 - 100