End-to-End Speech Recognition For Arabic Dialects

被引：0

作者：

Seham Nasr

Rehab Duwairi

Muhannad Quwaider

机构：

[1] Jordan University of Science and Technology,Department of Computer Engineering

[2] Jordan University of Science and Technology,Department of Computer Information Systems

来源：

Arabian Journal for Science and Engineering | 2023年 / 48卷

关键词：

Automatic speech recognition; Arabic dialectal ASR; End-to-end Arabic ASR; Yemeni ASR; Jordanian ASR;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Automatic speech recognition or speech-to-text is a human–machine interaction task, and although it is challenging, it is attracting several researchers and companies such as Google, Amazon, and Facebook. End-to-end speech recognition is still in its infancy for low-resource languages such as Arabic and its dialects due to the lack of transcribed corpora. In this paper, we have introduced novel transcribed corpora for Yamani Arabic, Jordanian Arabic, and multi-dialectal Arabic. We also designed several baseline sequence-to-sequence deep neural models for Arabic dialects’ end-to-end speech recognition. Moreover, Mozilla’s DeepSpeech2 model was trained from scratch using our corpora. The Bidirectional Long Short-Term memory (Bi-LSTM) with attention model achieved encouraging results on the Yamani speech corpus with 59% Word Error Rate (WER) and 51% Character Error Rate (CER). The Bi-LSTM with attention achieved, on the Jordanian speech corpus, 83% WER and 70% CER. By comparison, the model achieved, on the multi-dialectal Yem-Jod-Arab speech corpus, 53% WER and 39% CER. The performance of the DeepSpeech2 model has superseded the performance of the baseline models with 31% WER and 24% CER for the Yamani corpus; 68 WER and 40 CER for the Jordanian corpus. Lastly, DeepSpeech2 gave better results, on multi-dialectal Arabic corpus, with 30% WER and 20% CER.

引用

页码：10617 / 10633

页数：16

共 50 条

[31] Adapting End-to-End Speech Recognition for Readable Subtitles
Liu, Danni
Niehues, Jan
Spanakis, Gerasimos
17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 247 - 256
[32] Hybrid end-to-end model for Kazakh speech recognition
Mamyrbayev O.Z.
Oralbekova D.O.
Alimhan K.
Nuranbayeva B.M.
International Journal of Speech Technology, 2023, 26 (02) : 261 - 270
[33] Speech-Driven End-to-End Language Discrimination toward Chinese Dialects
Xu, Fan
Luo, Jian
Wang, Mingwen
Zhou, Guodong
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (05)
[34] End-to-End Speech Emotion Recognition With Gender Information
Sun, Ting-Wei
IEEE ACCESS, 2020, 8 (08): : 152423 - 152438
[35] Residual Language Model for End-to-end Speech Recognition
Tsunoo, Emiru
Kashiwagi, Yosuke
Narisetty, Chaitanya
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 3899 - 3903
[36] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
Pundak, Golan
Sainath, Tara N.
Prabhavalkar, Rohit
Kannan, Anjuli
Zhao, Ding
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
[37] End-to-end Speech-to-Punctuated-Text Recognition
Nozaki, Jumon
Kawahara, Tatsuya
Ishizuka, Kenkichi
Hashimoto, Taiichi
INTERSPEECH 2022, 2022, : 1811 - 1815
[38] FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Gao, Zhifu
Li, Zerui
Wang, Jiaming
Luo, Haoneng
Shi, Xian
Chen, Mengzhe
Li, Yabin
Zuo, Lingyun
Du, Zhihao
Zhang, Shiliang
INTERSPEECH 2023, 2023, : 1593 - 1597
[39] Multi-Stream End-to-End Speech Recognition
Li, Ruizhi
Wang, Xiaofei
Mallidi, Sri Harish
Watanabe, Shinji
Hori, Takaaki
Hermansky, Hynek
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655
[40] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
Settle, Shane
Le Roux, Jonathan
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823

← 1 2 3 4 5 →