End-to-End Speech Recognition For Arabic Dialects

被引:4
|
作者
Nasr, Seham [1 ]
Duwairi, Rehab [2 ]
Quwaider, Muhannad [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Engn, Al Ramtha, Al Ramtha 22110, Irbid, Jordan
[2] Jordan Univ Sci & Technol, Dept Comp Informat Syst, Al Ramtha 22110, Irbid, Jordan
关键词
Automatic speech recognition; Arabic dialectal ASR; End-to-end Arabic ASR; Yemeni ASR; Jordanian ASR; CONVOLUTIONAL NEURAL-NETWORKS; UNDER-RESOURCED LANGUAGES; SYSTEM;
D O I
10.1007/s13369-023-07670-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic speech recognition or speech-to-text is a human-machine interaction task, and although it is challenging, it is attracting several researchers and companies such as Google, Amazon, and Facebook. End-to-end speech recognition is still in its infancy for low-resource languages such as Arabic and its dialects due to the lack of transcribed corpora. In this paper, we have introduced novel transcribed corpora for Yamani Arabic, Jordanian Arabic, and multi-dialectal Arabic. We also designed several baseline sequence-to-sequence deep neural models for Arabic dialects' end-to-end speech recognition. Moreover, Mozilla's DeepSpeech2 model was trained from scratch using our corpora. The Bidirectional Long Short-Term memory (Bi-LSTM) with attention model achieved encouraging results on the Yamani speech corpus with 59% Word Error Rate (WER) and 51% Character Error Rate (CER). The Bi-LSTM with attention achieved, on the Jordanian speech corpus, 83% WER and 70% CER. By comparison, the model achieved, on the multi-dialectal Yem-Jod-Arab speech corpus, 53% WER and 39% CER. The performance of the DeepSpeech2 model has superseded the performance of the baseline models with 31% WER and 24% CER for the Yamani corpus; 68 WER and 40 CER for the Jordanian corpus. Lastly, DeepSpeech2 gave better results, on multi-dialectal Arabic corpus, with 30% WER and 20% CER.
引用
下载
收藏
页码:10617 / 10633
页数:17
相关论文
共 50 条
  • [21] An Overview of End-to-End Automatic Speech Recognition
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    SYMMETRY-BASEL, 2019, 11 (08):
  • [22] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
  • [23] Performance Monitoring for End-to-End Speech Recognition
    Li, Ruizhi
    Sell, Gregory
    Hermansky, Hynek
    INTERSPEECH 2019, 2019, : 2245 - 2249
  • [24] End-to-End Speech Recognition and Disfluency Removal
    Lou, Paria Jamshid
    Johnson, Mark
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2051 - 2061
  • [25] End-to-End Speech Recognition in Agglutinative Languages
    Mamyrbayev, Orken
    Alimhan, Keylan
    Zhumazhanov, Bagashar
    Turdalykyzy, Tolganay
    Gusmanova, Farida
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
  • [26] End-to-end Korean Digits Speech Recognition
    Roh, Jong-hyuk
    Cho, Kwantae
    Kim, Youngsam
    Cho, Sangrae
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139
  • [27] SPEECH ENHANCEMENT USING END-TO-END SPEECH RECOGNITION OBJECTIVES
    Subramanian, Aswin Shanmugam
    Wang, Xiaofei
    Baskar, Murali Karthick
    Watanabe, Shinji
    Taniguchi, Toru
    Tran, Dung
    Fujita, Yuya
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 234 - 238
  • [28] DeepOnKHATT: An End-to-End Arabic Online Handwriting Recognition System
    Alwajih, Fakhraddin
    Badr, Eman
    Abdou, Sherif
    Fahmy, Aly
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (11)
  • [29] Phonetically Induced Subwords for End-to-End Speech Recognition
    Papadourakis, Vasileios
    Mueller, Markus
    Liu, Jing
    Mouchtaris, Athanasios
    Omologo, Maurizio
    INTERSPEECH 2021, 2021, : 1992 - 1996
  • [30] Adapting End-to-End Speech Recognition for Readable Subtitles
    Liu, Danni
    Niehues, Jan
    Spanakis, Gerasimos
    17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 247 - 256