End-to-End Speech Recognition For Arabic Dialects

被引:4
|
作者
Nasr, Seham [1 ]
Duwairi, Rehab [2 ]
Quwaider, Muhannad [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Engn, Al Ramtha, Al Ramtha 22110, Irbid, Jordan
[2] Jordan Univ Sci & Technol, Dept Comp Informat Syst, Al Ramtha 22110, Irbid, Jordan
关键词
Automatic speech recognition; Arabic dialectal ASR; End-to-end Arabic ASR; Yemeni ASR; Jordanian ASR; CONVOLUTIONAL NEURAL-NETWORKS; UNDER-RESOURCED LANGUAGES; SYSTEM;
D O I
10.1007/s13369-023-07670-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic speech recognition or speech-to-text is a human-machine interaction task, and although it is challenging, it is attracting several researchers and companies such as Google, Amazon, and Facebook. End-to-end speech recognition is still in its infancy for low-resource languages such as Arabic and its dialects due to the lack of transcribed corpora. In this paper, we have introduced novel transcribed corpora for Yamani Arabic, Jordanian Arabic, and multi-dialectal Arabic. We also designed several baseline sequence-to-sequence deep neural models for Arabic dialects' end-to-end speech recognition. Moreover, Mozilla's DeepSpeech2 model was trained from scratch using our corpora. The Bidirectional Long Short-Term memory (Bi-LSTM) with attention model achieved encouraging results on the Yamani speech corpus with 59% Word Error Rate (WER) and 51% Character Error Rate (CER). The Bi-LSTM with attention achieved, on the Jordanian speech corpus, 83% WER and 70% CER. By comparison, the model achieved, on the multi-dialectal Yem-Jod-Arab speech corpus, 53% WER and 39% CER. The performance of the DeepSpeech2 model has superseded the performance of the baseline models with 31% WER and 24% CER for the Yamani corpus; 68 WER and 40 CER for the Jordanian corpus. Lastly, DeepSpeech2 gave better results, on multi-dialectal Arabic corpus, with 30% WER and 20% CER.
引用
下载
收藏
页码:10617 / 10633
页数:17
相关论文
共 50 条
  • [31] Hybrid end-to-end model for Kazakh speech recognition
    Mamyrbayev O.Z.
    Oralbekova D.O.
    Alimhan K.
    Nuranbayeva B.M.
    International Journal of Speech Technology, 2023, 26 (02) : 261 - 270
  • [32] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    INTERSPEECH 2021, 2021, : 4079 - 4083
  • [33] End-to-End Speech Emotion Recognition With Gender Information
    Sun, Ting-Wei
    IEEE ACCESS, 2020, 8 (08): : 152423 - 152438
  • [34] Residual Language Model for End-to-end Speech Recognition
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Narisetty, Chaitanya
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3899 - 3903
  • [35] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Pundak, Golan
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Zhao, Ding
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
  • [36] End-to-end Speech-to-Punctuated-Text Recognition
    Nozaki, Jumon
    Kawahara, Tatsuya
    Ishizuka, Kenkichi
    Hashimoto, Taiichi
    INTERSPEECH 2022, 2022, : 1811 - 1815
  • [37] Speech-Driven End-to-End Language Discrimination toward Chinese Dialects
    Xu, Fan
    Luo, Jian
    Wang, Mingwen
    Zhou, Guodong
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (05)
  • [38] EXPLORING NEURAL TRANSDUCERS FOR END-TO-END SPEECH RECOGNITION
    Battenberg, Eric
    Chen, Jitong
    Child, Rewon
    Coates, Adam
    Gaur, Yashesh
    Li, Yi
    Liu, Hairong
    Satheesh, Sanjeev
    Sriram, Anuroop
    Zhu, Zhenyao
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 206 - 213
  • [39] Semi-Supervised End-to-End Speech Recognition
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Ogawa, Atsunori
    Delcroix, Marc
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2 - 6
  • [40] END-TO-END SPEECH RECOGNITION WITH ADAPTIVE COMPUTATION STEPS
    Li, Mohan
    Liu, Min
    Masanori, Hattori
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6246 - 6250