Arabic speech recognition by end-to-end, modular systems and human

被引:16
|
作者
Hussein, Amir [1 ,2 ]
Watanabe, Shinji [3 ]
Ali, Ahmed [1 ]
机构
[1] HBKU, Qatar Comp Res Inst, Doha, Qatar
[2] Kanari AI, Pasadena, CA USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
关键词
Dialectal arabic; End-to-end speech recognition; Human speech recognition; Modern standard arabic; Transformer;
D O I
10.1016/j.csl.2021.101272
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in automatic speech recognition (ASR) have achieved accuracy levels comparable to human transcribers, which led researchers to debate if the machine has reached human performance. Previous work focused on the English language and modular hidden Markov model-deep neural network (HMM-DNN) systems. In this paper, we perform a comprehensive benchmarking for end-to-end transformer ASR, modular HMM-DNN ASR, and human speech recognition (HSR) on the Arabic language and its dialects. For the HSR, we evaluate linguist performance and lay-native speaker performance on a new dataset collected as a part of this study. For ASR the end-to-end work led to 12.5%, 27.5%, 33.8% WER; a new performance milestone for the MGB2, MGB3, and MGB5 challenges respectively. Our results suggest that human performance in the Arabic language is still considerably better than the machine with an absolute WER gap of 3.5% on average.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
  • [32] End-to-End Speech Recognition in Agglutinative Languages
    Mamyrbayev, Orken
    Alimhan, Keylan
    Zhumazhanov, Bagashar
    Turdalykyzy, Tolganay
    Gusmanova, Farida
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
  • [33] End-to-end Korean Digits Speech Recognition
    Roh, Jong-hyuk
    Cho, Kwantae
    Kim, Youngsam
    Cho, Sangrae
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139
  • [34] SPEECH ENHANCEMENT USING END-TO-END SPEECH RECOGNITION OBJECTIVES
    Subramanian, Aswin Shanmugam
    Wang, Xiaofei
    Baskar, Murali Karthick
    Watanabe, Shinji
    Taniguchi, Toru
    Tran, Dung
    Fujita, Yuya
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 234 - 238
  • [35] End-to-end neural systems for automatic children speech recognition: An empirical study
    Shivakumar, Prashanth Gurunath
    Narayanan, Shrikanth
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [36] Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems
    Wang, Xiaoqiang
    Liu, Yanqing
    Li, Jinyu
    Miljanic, Veljko
    Zhao, Sheng
    Khalil, Hosam
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 3089 - 3097
  • [37] DeepOnKHATT: An End-to-End Arabic Online Handwriting Recognition System
    Alwajih, Fakhraddin
    Badr, Eman
    Abdou, Sherif
    Fahmy, Aly
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (11)
  • [38] Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L)
    Scharenborg, Odette
    Ten Bosch, Louis
    Boves, Lou
    Norris, Dennis
    1600, Acoustical Society of America (114):
  • [39] Bridging automatic speech recognition. and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition
    Scharenborg, O
    ten Bosch, L
    Boves, L
    Norris, D
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2003, 114 (06): : 3032 - 3035
  • [40] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    INTERSPEECH 2021, 2021, : 4079 - 4083