Evaluating the effect of using different transcription schemes in building a speech recognition system for Arabic

被引:10
|
作者
Alsharhan, Eiman [1 ]
Ramsay, Allan [2 ]
Ahmed, Hanady [3 ]
机构
[1] Kuwait Univ, Kuwait, Kuwait
[2] Univ Manchester, Manchester, Lancs, England
[3] Alexandria Univ, Alexandria, Egypt
关键词
Natural language processing; Arabic speech recognition; Diacritisation; Phonetic transcription; MADAMIRA; SAMA; Phonological rules; GENERATION;
D O I
10.1007/s10772-020-09720-z
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It is well-known that the Arabic language poses non-trivial issues for Automatic Speech Recognition (ASR) systems. This paper is concerned with the problems posed by the complex morphology of the language and the absence of diacritics in the written form of the language. Several acoustic and language models are built using different transcription resources, namely a grapheme-based transcription which uses non-diacriticised text materials, phoneme-based transcriptions obtained from automatic diacritisation tools (SAMA or MADAMIRA), and a predefined dictionary. The paper presents a comprehensive assessment for the aforementioned transcription schemes by employing them in building a collection of Arabic ASR systems using the GALE (phase 3) Arabic broadcast news and broadcast conversational speech datasets LDC (2015), which include 260 h of recorded material. Contrary to our expectations, the experimental evidence confirms that the use of grapheme-based transcription is superior to the use of phoneme-based transcription. To investigate this further, several modifications are applied to the MADAMIRA analysis by applying a number of simple phonological rules. These improvements have a substantial effect on the systems' performance, but it is still inferior to the use of a simple grapheme-based transcription. The research also examined the use of a manually diacriticised subset of the data in training the ASR system and compared it with the use of grapheme-based transcription and phoneme-based transcription obtained from MADAMIRA. The goal of this step is to validate MADAMIRA's analysis. The results show that using the manually diacriticised text in generating the phonetic transcription can significantly decrease the WER compared to the use of MADAMIRA diacriticised text and also the isolated graphemes. The results obtained strongly indicate that providing the training model with less information about the data (only graphemes) is less damaging than providing it with inaccurate information.
引用
收藏
页码:43 / 56
页数:14
相关论文
共 50 条
  • [1] Evaluating the effect of using different transcription schemes in building a speech recognition system for Arabic
    Eiman Alsharhan
    Allan Ramsay
    Hanady Ahmed
    International Journal of Speech Technology, 2022, 25 : 43 - 56
  • [2] Investigation Arabic Speech Recognition Using CMU Sphinx System
    Satori, Hassan
    Hiyassat, Hussein
    Harti, Mostafa
    Chenfour, Noureddine
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2009, 6 (02) : 186 - 190
  • [3] Diacritics Effect on Arabic Speech Recognition
    Sa’ed Abed
    Mohammad Alshayeji
    Sari Sultan
    Arabian Journal for Science and Engineering, 2019, 44 : 9043 - 9056
  • [4] Diacritics Effect on Arabic Speech Recognition
    Abed, Sa'ed
    Alshayeji, Mohammad
    Sultan, Sari
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9043 - 9056
  • [5] Multi-Level Improvement for a Transcription Generated by Automatic Speech Recognition System for Arabic
    Amich, Heithem
    Ben Mohamed, Mohamed
    Zrigui, Mounir
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (03) : 460 - 466
  • [6] A robust speech disorders correction system for Arabic language using visual speech recognition
    Farag, Ahmed
    El Adawy, Mohamed
    Ismail, Ahmed
    BIOMEDICAL RESEARCH-INDIA, 2013, 24 (02): : 185 - 192
  • [7] Mono-font cursive Arabic text recognition using speech recognition system
    Khorsheed, M. S.
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2006, 4109 : 755 - 763
  • [8] Arabic Speech Recognition System based on CMUSphinx
    Satori, H.
    Harti, M.
    Chenfour, N.
    ISCIII '07: 3RD INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, PROCEEDINGS, 2007, : 31 - +
  • [9] Arabic automatic segmentation system and its application for arabic speech recognition system
    Nofal, M
    Abdel-Raheem, E
    Kader, NSA
    Proceedings of the 46th IEEE International Midwest Symposium on Circuits & Systems, Vols 1-3, 2003, : 697 - 700
  • [10] A COMPLETE KALDI RECIPE FOR BUILDING ARABIC SPEECH RECOGNITION SYSTEMS
    Ali, Ahmed
    Zhang, Yifan
    Cardinal, Patrick
    Dahak, Najim
    Vogel, Stephan
    Glass, James
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 525 - 529