Phonetically rich and balanced text and speech corpora for Arabic language

被引:0
|
作者
Mohammad A. M. Abushariah
Raja N. Ainon
Roziati Zainuddin
Moustafa Elshafei
Othman O. Khalifa
机构
[1] University of Malaya,Faculty of Computer Science and Information Technology
[2] University of Jordan,King Abdullah II School for Information Technology
[3] King Fahd University of Petroleum and Minerals,Department of Systems Engineering
[4] International Islamic University Malaysia,Electrical and Computer Engineering Department, Faculty of Engineering
来源
关键词
Modern Standard Arabic; Speech corpus; Text corpus; Phonetically rich; Phonetically balanced; Automatic continuous speech recognition;
D O I
暂无
中图分类号
学科分类号
摘要
This paper describes the preparation, recording, analyzing, and evaluation of a new speech corpus for Modern Standard Arabic (MSA). The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing three major regions (Levant, Gulf, and Africa). Three hundred and sixty seven sentences are considered as phonetically rich and balanced, which are used for training Arabic Automatic Speech Recognition (ASR) systems. The rich characteristic is in the sense that it must contain all phonemes of Arabic language, whereas the balanced characteristic is in the sense that it must preserve the phonetic distribution of Arabic language. The remaining 48 sentences are created for testing purposes, which are mostly foreign to the training sentences and there are hardly any similarities in words. In order to evaluate the speech corpus, Arabic ASR systems were developed using the Carnegie Mellon University (CMU) Sphinx 3 tools at both training and testing/decoding levels. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 8 h of training speech data, the acoustic model is best using continuous observation’s probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The language model contains uni-grams, bi-grams, and tri-grams. For same speakers with different sentences, Arabic ASR systems obtained average Word Error Rate (WER) of 9.70%. For different speakers with same sentences, Arabic ASR systems obtained average WER of 4.58%, whereas for different speakers with different sentences, Arabic ASR systems obtained average WER of 12.39%.
引用
收藏
页码:601 / 634
页数:33
相关论文
共 50 条
  • [1] Phonetically rich and balanced text and speech corpora for Arabic language
    Abushariah, Mohammad A. M.
    Ainon, Raja N.
    Zainuddin, Roziati
    Elshafei, Moustafa
    Khalifa, Othman O.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2012, 46 (04) : 601 - 634
  • [2] Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus
    Abushariah, Mohammad
    Ainon, Raja Noor
    Zainuddin, Roziati
    Elshafei, Moustafa
    Khalifa, Othman
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2012, 9 (01) : 84 - 93
  • [3] DEVELOPMENT OF AN ARABIC PHONETICALLY BALANCED WORD LIST FOR USE IN SPEECH AUDIOMETRY
    PRICE, L
    FATANI, A
    POWELL, A
    WILSON, J
    DASHASH, N
    [J]. FOLIA PHONIATRICA ET LOGOPAEDICA, 1995, 47 (02) : 98 - 98
  • [4] Automatic Preparation of Standard Arabic Phonetically Rich Written Corpora with Different Linguistic Units
    Sindran, Fadi
    Mualla, Firas
    Haderlein, Tino
    Daqrouq, Khaled
    Noeth, Elmar
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 201 - 209
  • [5] On building phonetically and prosodically rich speech corpus for text-to-speech synthesis
    Matousek, Jindrich
    Romportl, Jan
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 442 - +
  • [6] Development and Validation of Phonetically Balanced Speech Perception Test in Urdu Language
    Noor, Hina
    Arif, Manzoor Hussain
    [J]. INTERNET JOURNAL OF ALLIED HEALTH SCIENCES AND PRACTICE, 2018, 16 (04):
  • [7] Building audio-visual phonetically annotated Arabic corpus for expressive text to speech
    Abdo, Omnia
    Abdou, Sherif
    Fashal, Mervat
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3767 - 3771
  • [8] Constructing Time Phonetically Balanced Word Recognition Test in Speech Audiometry through Large Written Corpora
    Munthuli, A.
    Sirimujalin, P.
    Tantibundhit, C.
    Onsuwan, C.
    Klangpornkun, N.
    Kosawat, K.
    [J]. 2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
  • [9] Adapting espeak to Arabic language: Converting Arabic text to speech language using espeak
    Zerrouki, Taha
    Shquier, Mohammed M. Abu
    Balla, Amar
    Bousbia, Nabila
    Sakraoui, Imededdine
    Boudardara, Fateh
    [J]. International Journal of Reasoning-based Intelligent Systems, 2019, 11 (01) : 76 - 89
  • [10] Text-To-Speech technology for Arabic language learners
    Oumaima, Zine
    Abdelouafi, Meziane
    Meryem, El Hadi
    [J]. 2018 IEEE 5TH INTERNATIONAL CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'18), 2018, : 432 - 436