Evaluating the effect of using different transcription schemes in building a speech recognition system for Arabic

被引:10
|
作者
Alsharhan, Eiman [1 ]
Ramsay, Allan [2 ]
Ahmed, Hanady [3 ]
机构
[1] Kuwait Univ, Kuwait, Kuwait
[2] Univ Manchester, Manchester, Lancs, England
[3] Alexandria Univ, Alexandria, Egypt
关键词
Natural language processing; Arabic speech recognition; Diacritisation; Phonetic transcription; MADAMIRA; SAMA; Phonological rules; GENERATION;
D O I
10.1007/s10772-020-09720-z
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It is well-known that the Arabic language poses non-trivial issues for Automatic Speech Recognition (ASR) systems. This paper is concerned with the problems posed by the complex morphology of the language and the absence of diacritics in the written form of the language. Several acoustic and language models are built using different transcription resources, namely a grapheme-based transcription which uses non-diacriticised text materials, phoneme-based transcriptions obtained from automatic diacritisation tools (SAMA or MADAMIRA), and a predefined dictionary. The paper presents a comprehensive assessment for the aforementioned transcription schemes by employing them in building a collection of Arabic ASR systems using the GALE (phase 3) Arabic broadcast news and broadcast conversational speech datasets LDC (2015), which include 260 h of recorded material. Contrary to our expectations, the experimental evidence confirms that the use of grapheme-based transcription is superior to the use of phoneme-based transcription. To investigate this further, several modifications are applied to the MADAMIRA analysis by applying a number of simple phonological rules. These improvements have a substantial effect on the systems' performance, but it is still inferior to the use of a simple grapheme-based transcription. The research also examined the use of a manually diacriticised subset of the data in training the ASR system and compared it with the use of grapheme-based transcription and phoneme-based transcription obtained from MADAMIRA. The goal of this step is to validate MADAMIRA's analysis. The results show that using the manually diacriticised text in generating the phonetic transcription can significantly decrease the WER compared to the use of MADAMIRA diacriticised text and also the isolated graphemes. The results obtained strongly indicate that providing the training model with less information about the data (only graphemes) is less damaging than providing it with inaccurate information.
引用
收藏
页码:43 / 56
页数:14
相关论文
共 50 条
  • [41] A FPGA-based HMM for a discrete arabic speech recognition system
    Elmisery, FA
    Khalil, AH
    Salama, AE
    Hammed, HF
    ICM 2003: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON MICROELECTRONICS, 2003, : 322 - 325
  • [42] Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System
    Alasadi, Abdulmalik A.
    Adhyani, Theyazn H. H.
    Deshmukh, Ratnadeep R.
    Alahmadi, Ahmed H.
    Alshebami, Ali Saleh
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2020, 10 (02) : 5547 - 5553
  • [43] Building language models for Tamil speech recognition system
    Saraswathi, S
    Geetha, TV
    APPLIED COMPUTING, PROCEEDINGS, 2004, 3285 : 161 - 168
  • [44] Building a Recognition System of Speech Emotion and Emotional States
    Feng, Xiaoyan
    Watada, Junzo
    2013 SECOND INTERNATIONAL CONFERENCE ON ROBOT, VISION AND SIGNAL PROCESSING (RVSP), 2013, : 253 - 258
  • [45] Evaluating Speech Intelligibility for Cochlear Implants Using Automatic Speech Recognition
    Zhou, Hengzhi
    Shi, Mingyue
    Meng, Qinglin
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 1 - 5
  • [46] Syllable-Based Automatic Arabic Speech Recognition in Different Conditions of Noise
    Azmi, Mohamed M.
    Tolba, Hesham
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 601 - +
  • [47] A new method for evaluating priming effect in speech recognition
    Yang, Jing Fei
    Ying, Wen Tao
    Hong, Zhi Ling
    COMPUTING, CONTROL, INFORMATION AND EDUCATION ENGINEERING, 2015, : 685 - 687
  • [48] Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition
    Nahar, Khalid M. O.
    Abu Shquier, Mohammed
    Al-Khatib, Wasfi G.
    Al-Muhtaseb, Husni
    Elshafei, Moustafa
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (03) : 495 - 508
  • [49] A 0.75 Kbps speech codec using recognition and synthesis schemes
    Chen, HC
    Chen, CY
    Tsou, KM
    Chen, OTC
    1997 IEEE WORKSHOP ON SPEECH CODING FOR TELECOMMUNICATIONS, PROCEEDINGS: BACK TO BASICS: ATTACKING FUNDAMENTAL PROBLEMS IN SPEECH CODING, 1997, : 27 - 28
  • [50] Arabic Code-Switching Speech Recognition using Monolingual Data
    Ali, Ahmed
    Chowdhur, Shammur
    Hussein, Amir
    Hifny, Yasser
    INTERSPEECH 2021, 2021, : 3475 - 3479