Evaluating the effect of using different transcription schemes in building a speech recognition system for Arabic

被引:10
|
作者
Alsharhan, Eiman [1 ]
Ramsay, Allan [2 ]
Ahmed, Hanady [3 ]
机构
[1] Kuwait Univ, Kuwait, Kuwait
[2] Univ Manchester, Manchester, Lancs, England
[3] Alexandria Univ, Alexandria, Egypt
关键词
Natural language processing; Arabic speech recognition; Diacritisation; Phonetic transcription; MADAMIRA; SAMA; Phonological rules; GENERATION;
D O I
10.1007/s10772-020-09720-z
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It is well-known that the Arabic language poses non-trivial issues for Automatic Speech Recognition (ASR) systems. This paper is concerned with the problems posed by the complex morphology of the language and the absence of diacritics in the written form of the language. Several acoustic and language models are built using different transcription resources, namely a grapheme-based transcription which uses non-diacriticised text materials, phoneme-based transcriptions obtained from automatic diacritisation tools (SAMA or MADAMIRA), and a predefined dictionary. The paper presents a comprehensive assessment for the aforementioned transcription schemes by employing them in building a collection of Arabic ASR systems using the GALE (phase 3) Arabic broadcast news and broadcast conversational speech datasets LDC (2015), which include 260 h of recorded material. Contrary to our expectations, the experimental evidence confirms that the use of grapheme-based transcription is superior to the use of phoneme-based transcription. To investigate this further, several modifications are applied to the MADAMIRA analysis by applying a number of simple phonological rules. These improvements have a substantial effect on the systems' performance, but it is still inferior to the use of a simple grapheme-based transcription. The research also examined the use of a manually diacriticised subset of the data in training the ASR system and compared it with the use of grapheme-based transcription and phoneme-based transcription obtained from MADAMIRA. The goal of this step is to validate MADAMIRA's analysis. The results show that using the manually diacriticised text in generating the phonetic transcription can significantly decrease the WER compared to the use of MADAMIRA diacriticised text and also the isolated graphemes. The results obtained strongly indicate that providing the training model with less information about the data (only graphemes) is less damaging than providing it with inaccurate information.
引用
收藏
页码:43 / 56
页数:14
相关论文
共 50 条
  • [21] Arabic speech recognition using recurrent neural networks
    El Choubassi, MM
    El Khoury, HE
    Alagha, CEJ
    Skaf, JA
    Al-Alaoui, MA
    PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2003, : 543 - 547
  • [22] Text-to-speech synthesis system with Arabic diacritic recognition system
    Rebai, Ilyes
    BenAyed, Yassine
    COMPUTER SPEECH AND LANGUAGE, 2015, 34 (01): : 43 - 60
  • [23] Building diphone database for Arabic text to speech synthesis system
    El Kadhi, Aymen
    Gherri, Fadhila
    Amiri, Hamid
    3RD INTERNATIONAL CONFERENCE ON CONTROL, ENGINEERING & INFORMATION TECHNOLOGY (CEIT 2015), 2015,
  • [24] Evaluating the Performance of a Speech Recognition Based System
    Pandey, Vinod Kumar
    Kopparapu, Sunil Kumar
    ADVANCES IN COMPUTING AND COMMUNICATIONS, PT III, 2011, 192 : 230 - 238
  • [25] Efficient and Robust Arabic Automotive Speech Command Recognition System
    Ouali, Soufiyan
    El Garouani, Said
    ALGORITHMS, 2024, 17 (09)
  • [26] A new language model for an automatic Arabic speech recognition system
    Rashwan, M.
    Journal of Engineering and Applied Science, 2002, 49 (01): : 175 - 193
  • [27] The Effect of Different Compression Schemes on Speech Signals
    Karam, Jalal
    Saad, Raed
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 18, 2006, 18 : 87 - +
  • [28] A comparative study for Arabic speech recognition system in noisy environments
    Abdelkbir Ouisaadane
    Said Safi
    International Journal of Speech Technology, 2021, 24 : 761 - 770
  • [29] Development of a TV Broadcasts Speech Recognition System for Qatari Arabic
    Elmahdy, Mohamed
    Hasegawa-Johnson, Mark
    Mustafawi, Eiman
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3057 - 3061
  • [30] A comparative study for Arabic speech recognition system in noisy environments
    Ouisaadane, Abdelkbir
    Safi, Said
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 761 - 770