Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus

被引:0
|
作者
Hamed, Injy [1 ]
Elmandy, Mohamed [1 ]
Abdennadher, Slim [1 ]
机构
[1] German Univ Cairo, Cairo, Egypt
关键词
Speech corpus; Dialectal Egyptian Arabic; Conversational Egyptian Arabic; Egyptian Arabic-English; code-switching; code-mixing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech corpora are key components needed by both: linguists (in language analyses, research and teaching languages) and Natural Language Processing (NLP) researchers (in training and evaluating several NLP tasks such as speech recognition, text-to-speech and speech-to-text synthesis). Despite of the great demand, there is still a huge shortage in available corpora, especially in the case of dialectal languages, and code-switched speech. In this paper, we present our efforts in collecting and analyzing a speech corpus for conversational Egyptian Arabic. As in other multilingual societies, it is common among Egyptians to use a mix of Arabic and English in daily conversations. The act of switching languages, at sentence boundaries or within the same sentence, is referred to as code-switching. The aim of this work is a three-fold: (1) gather conversational Egyptian Arabic spontaneous speech, (2) obtain manual transcriptions and (3) analyze the speech from the code-switching perspective. A subset of the transcriptions were manually annotated for part-of-speech (POS) tags. The POS distribution of the embedded words was analyzed as well as the POS distribution for the trigger words (Arabic words preceding a code-switching point). The speech corpus can be obtained by contacting the authors.
引用
收藏
页码:3805 / 3809
页数:5
相关论文
共 50 条
  • [1] Cairo Student Code-Switch (CSCS) Corpus: An Annotated Egyptian Arabic-English Corpus
    Balabel, Mohamed
    Hamed, Injy
    Abdennadher, Slim
    Ngoc Thang Vu
    Cetinoglu, Oezlem
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3973 - 3977
  • [2] ArzEn: A Speech Corpus for Code-switched Egyptian Arabic-English
    Hamed, Injy
    Ngoc Thang Vu
    Abdennadher, Slim
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4237 - 4246
  • [3] Building a First Language Model for Code-switch Arabic-English
    Hamed, Injy
    Elmahdy, Mohamed
    Abdennadher, Slim
    [J]. ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 : 208 - 216
  • [4] A FIRST SPEECH RECOGNITION SYSTEM FOR MANDARIN-ENGLISH CODE-SWITCH CONVERSATIONAL SPEECH
    Ngoc Thang Vu
    Lyu, Dau-Cheng
    Weiner, Jochen
    Telaar, Dominic
    Schlippe, Tim
    Blaicher, Fabian
    Chng, Eng-Siong
    Schultz, Tanja
    Li, Haizhou
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4889 - 4892
  • [5] A FIRST SPEECH RECOGNITION SYSTEM FOR MANDARIN-ENGLISH CODE-SWITCH CONVERSATIONAL SPEECH
    Ngoc Thang Vu
    Lyu, Dau-Cheng
    Weiner, Jochen
    Telaar, Dominic
    Schlippe, Tim
    Blaicher, Fabian
    Chng, Eng-Siong
    Schultz, Tanja
    Li, Haizhou
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4889 - 4892
  • [6] CODE-SWITCH SPEECH RESCORING WITH MONOLINGUAL DATA
    Liu, Guoyu
    Cao, Lixin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6229 - 6233
  • [7] LANGUAGE DIARIZATION FOR CODE-SWITCH CONVERSATIONAL SPEECH
    Lyu, Dau-Cheng
    Chng, Eng-Siong
    Li, Haizhou
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7314 - 7318
  • [8] TEXTUAL DATA AUGMENTATION FOR ARABIC-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
    Hussein, Amir
    Chowdhury, Shammur Absar
    Abdelali, Ahmed
    Dehak, Najim
    Ali, Ahmed
    Khudanpur, Sanjeev
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 777 - 784
  • [10] A DICTIONARY OF EGYPTIAN ARABIC - ARABIC-ENGLISH - HINDS,M, BADAWI,E
    IRWIN, R
    [J]. TLS-THE TIMES LITERARY SUPPLEMENT, 1988, (4424): : 67 - 67