Evaluating the effect of using different transcription schemes in building a speech recognition system for Arabic

被引:10
|
作者
Alsharhan, Eiman [1 ]
Ramsay, Allan [2 ]
Ahmed, Hanady [3 ]
机构
[1] Kuwait Univ, Kuwait, Kuwait
[2] Univ Manchester, Manchester, Lancs, England
[3] Alexandria Univ, Alexandria, Egypt
关键词
Natural language processing; Arabic speech recognition; Diacritisation; Phonetic transcription; MADAMIRA; SAMA; Phonological rules; GENERATION;
D O I
10.1007/s10772-020-09720-z
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It is well-known that the Arabic language poses non-trivial issues for Automatic Speech Recognition (ASR) systems. This paper is concerned with the problems posed by the complex morphology of the language and the absence of diacritics in the written form of the language. Several acoustic and language models are built using different transcription resources, namely a grapheme-based transcription which uses non-diacriticised text materials, phoneme-based transcriptions obtained from automatic diacritisation tools (SAMA or MADAMIRA), and a predefined dictionary. The paper presents a comprehensive assessment for the aforementioned transcription schemes by employing them in building a collection of Arabic ASR systems using the GALE (phase 3) Arabic broadcast news and broadcast conversational speech datasets LDC (2015), which include 260 h of recorded material. Contrary to our expectations, the experimental evidence confirms that the use of grapheme-based transcription is superior to the use of phoneme-based transcription. To investigate this further, several modifications are applied to the MADAMIRA analysis by applying a number of simple phonological rules. These improvements have a substantial effect on the systems' performance, but it is still inferior to the use of a simple grapheme-based transcription. The research also examined the use of a manually diacriticised subset of the data in training the ASR system and compared it with the use of grapheme-based transcription and phoneme-based transcription obtained from MADAMIRA. The goal of this step is to validate MADAMIRA's analysis. The results show that using the manually diacriticised text in generating the phonetic transcription can significantly decrease the WER compared to the use of MADAMIRA diacriticised text and also the isolated graphemes. The results obtained strongly indicate that providing the training model with less information about the data (only graphemes) is less damaging than providing it with inaccurate information.
引用
收藏
页码:43 / 56
页数:14
相关论文
共 50 条
  • [31] Development of a phonetic system for large vocabulary Arabic speech recognition
    Gales, M. J. F.
    Diehl, F.
    Raut, C. K.
    Tomalin, M.
    Woodland, P. C.
    Yu, K.
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 24 - 29
  • [32] DYSLEXIC CHILDREN'S READING: BUILDING A SPEECH RECOGNITION ENGINE USING AUTOMATIC TRANSCRIPTION AND LABELING
    Husni, Husniza
    Yusof, Yuhanis
    Kamaruddin, Siti Sakira
    Him, Nik Nurhidayat Nik
    ICERI2015: 8TH INTERNATIONAL CONFERENCE OF EDUCATION, RESEARCH AND INNOVATION, 2015, : 2360 - 2366
  • [33] RAPID PHONETIC TRANSCRIPTION USING EVERYDAY LIFE NATURAL CHAT ALPHABET ORTHOGRAPHY FOR DIALECTAL ARABIC SPEECH RECOGNITION
    Elmahdy, Mohamed
    Gruhn, Rainer
    Abdennadher, Slim
    Minker, Wolfgang
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4936 - 4939
  • [34] ARABIC SPEECH PRONUNCIATION RECOGNITION AND CORRECTION USING AUTOMATIC SPEECH RECOGNIZER (ASR)
    Dahan, H. B.
    Mannan, A.
    INTED2012: INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE, 2012, : 4009 - 4016
  • [35] Arabic Sign Language Recognition and Generating Arabic Speech Using Convolutional Neural Network
    Kamruzzaman, M. M.
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020
  • [36] Acoustic training system for speaker independent continuous arabic speech recognition system
    Nofal, M
    Abdel-Raheem, E
    El Henawy, H
    Kader, NA
    Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 200 - 203
  • [37] Slovak Broadcast News Speech Recognition and Transcription System
    Lojka, Martin
    Viszlay, Peter
    Stas, Jan
    Hladek, Daniel
    Juhar, Jozef
    ADVANCES IN NETWORK-BASED INFORMATION SYSTEMS, NBIS-2018, 2019, 22 : 385 - 394
  • [38] Speech Recognition System and Formant Based Analysis of Spoken Arabic Vowels
    Alotaibi, Yousef Ajami
    Hussain, Amir
    FUTURE GENERATION INFORMATION TECHNOLOGY, PROCEEDINGS, 2009, 5899 : 50 - +
  • [39] The development of acoustic models for command and control arabic speech recognition system
    Nofal, M
    Reheem, EA
    El Henawy, H
    Abdel Kader, N
    ICEEC'04: 2004 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTER ENGINEERING, PROCEEDINGS, 2004, : 702 - 705
  • [40] Evaluating the impact of different acoustic contexts on German speech recognition
    Pandya, Darshit
    Stuckenschmidt, Heiner
    2024 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS, PERCOM WORKSHOPS, 2024,