End-to-End Speech Recognition of Tamil Language

被引:10
|
作者
Changrampadi, Mohamed Hashim [1 ]
Shahina, A. [2 ]
Narayanan, M. Badri [2 ]
Khan, A. Nayeemulla [3 ]
机构
[1] C Abdul Hakeem Coll Engn & Technol, Dept Elect & Commun Engn, Melvisharam 632509, India
[2] Sri Sivasubramaniya Nadar Coll Engn, Dept Informat Technol, Kalavakkam 603110, India
[3] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai 600127, Tamil Nadu, India
来源
关键词
End to end speech recognition; deep learning; under-resourced language; semi-supervised speech corpus development; SYSTEM; ASR;
D O I
10.32604/iasc.2022.022021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research in speech recognition is progressing with numerous state-ofthe-art results in recent times. However, relatively fewer research is being carried out in Automatic Speech Recognition (ASR) for languages with low resources. We present a method to develop speech recognition model with minimal resources using Mozilla DeepSpeech architecture. We have utilized freely available online computational resources for training, enabling similar approaches to be carried out for research in a low-resourced languages in a financially constrained environments. We also present novel ways to build an efficient language model from publicly available web resources to improve accuracy in ASR. The proposed ASR model gives the best result of 24.7% Word Error Rate (WER), compared to 55% WER by Google speech-to-text. We have also demonstrated a semi-supervised development of speech corpus using our trained ASR model, indicating a cost effective approach of building large vocabulary corpus for low resource language. The trained Tamil ASR model and the training sets are released in public domain and are available on GitHub.
引用
收藏
页码:1309 / 1323
页数:15
相关论文
共 50 条
  • [1] Residual Language Model for End-to-end Speech Recognition
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Narisetty, Chaitanya
    Watanabe, Shinji
    [J]. INTERSPEECH 2022, 2022, : 3899 - 3903
  • [2] TOWARDS LANGUAGE-UNIVERSAL END-TO-END SPEECH RECOGNITION
    Kim, Suyoun
    Seltzer, Michael L.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4914 - 4918
  • [3] End-to-End Large Vocabulary Speech Recognition for the Serbian Language
    Popovic, Branislav
    Pakoci, Edvin
    Pekar, Darko
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 343 - 352
  • [4] LEVERAGING LANGUAGE ID IN MULTILINGUAL END-TO-END SPEECH RECOGNITION
    Waters, Austin
    Gaur, Neeraj
    Haghani, Parisa
    Moreno, Pedro
    Qu, Zhongdi
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 928 - 935
  • [5] Noise Robust End-to-End Speech Recognition For Bangla Language
    Sumit, Sakhawat Hosain
    Al Muntasir, Tareq
    Zaman, M. M. Arefin
    Nandi, Rabindra Nath
    Sourov, Tanvir
    [J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [6] Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language
    Matsuura, Kohei
    Ueno, Sei
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2622 - 2628
  • [7] LANGUAGE INDEPENDENT END-TO-END ARCHITECTURE FOR JOINT LANGUAGE IDENTIFICATION AND SPEECH RECOGNITION
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 265 - 271
  • [8] Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
    Zhang, C.
    Li, B.
    Sainath, T. N.
    Strohman, T.
    Mavandadi, S.
    Chang, S.
    Haghani, P.
    [J]. INTERSPEECH 2022, 2022, : 3223 - 3227
  • [9] End-to-End Multilingual Speech Recognition System with Language Supervision Training
    Liu, Danyang
    Xu, Ji
    Zhang, Pengyuan
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (06): : 1427 - 1430
  • [10] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778