Improving Egyptian-to-English SMT by Mapping Egyptian into MSA

被引:0
|
作者
Durrani, Nadir [1 ]
Al-Onaizan, Yaser [2 ]
Ittycheriah, Abraham [2 ]
机构
[1] Univ Edinburgh, Edinburgh EH8 9YL, Midlothian, Scotland
[2] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the aims of DARPA BOLT project is to translate the Egyptian blog data into English. While the parallel data for MSA(1)-English is abundantly available, sparsely exists for Egyptian-English and Egyptian-MSA. A notable drop in the translation quality is observed when translating Egyptian to English in comparison with translating from MSA to English. One of the reasons for this drop is the high OOV rate, where as another is the dialectal differences between training and test data. This work is focused on improving Egyptian-to-English translation by bridging the gap between Egyptian and MSA. First we try to reduce the OOV rate by proposing MSA candidates for the unknown Egyptian words through different methods such as spelling correction, suggesting synonyms based on context etc. Secondly we apply convolution model using English as a pivot to map Egyptian words into MSA. We then evaluate our edits by running decoder built on MSA-to-English data. Our spelling-based correction shows an improvement of 1.7 BLEU points over the baseline system, that translates unedited Egyptian into English.
引用
收藏
页码:271 / 282
页数:12
相关论文
共 50 条
  • [41] A STUDY ON THE SUITABLE TECHNIQUES FOR IMPROVING THE FLOW PROPERTIES OF THE EGYPTIAN WAXY CRUDE OILS
    ELEMAN, N
    ELGAMAL, I
    ABUZIED, A
    REVUE DE L INSTITUT FRANCAIS DU PETROLE, 1993, 48 (04): : 371 - 382
  • [42] Improving safety and quality of Egyptian pastrami through alteration of its microbial community
    Abd-Elghany, Samir Mohammed
    El-Makhzangy, Attia Mohammed
    El-Shawaf, Abdel-Gawad Mohammed
    El-Mougy, Rehab Mohammed
    Salim, Khalid Ibrahim
    LWT-FOOD SCIENCE AND TECHNOLOGY, 2020, 118
  • [43] Arabic Spoken Language Identification System (ASLIS): A Proposed System to Identifying Modern Standard Arabic (MSA) and Egyptian Dialect
    Alshutayri, Areej
    Albarhamtoshy, Hassanin
    INFORMATICS ENGINEERING AND INFORMATION SCIENCE, PT II, 2011, 252 : 375 - 385
  • [44] Cross-cultural pragmatics: Strategy use in Egyptian Arabic and American English refusals
    Nelson, GL
    Carson, J
    Al Batal, M
    El Bakary, W
    APPLIED LINGUISTICS, 2002, 23 (02) : 163 - 189
  • [45] Late Egyptian, Old English and the re-evaluation of Discernment politeness in remote cultures
    Ridealgh, Kim
    Jucker, Andreas H.
    JOURNAL OF PRAGMATICS, 2019, 144 : 56 - 66
  • [46] The Riddle of the Rosetta: How an English Polymath and a French Polyglot Discovered the Meaning of Egyptian Hieroglyphs
    Trueper, Henning
    JOURNAL OF MODERN HISTORY, 2024, 96 (02): : 451 - 452
  • [47] Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus
    Hamed, Injy
    Elmandy, Mohamed
    Abdennadher, Slim
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3805 - 3809
  • [48] Islam Is Everywhere: Pre-Arab Spring Coverage of Islam in the English Egyptian Press
    Perreault, Greg
    JOURNAL OF MEDIA AND RELIGION, 2014, 13 (02) : 97 - 113
  • [49] Wh-Questions In English, Najdi Arabic, Upper Egyptian Arabic: A Comparative Study
    Alajmi, Mashael Hamed
    Alsager, Haroon N.
    IJAZ ARABI JOURNAL OF ARABIC LEARNING, 2023, 6 (03): : 773 - 784
  • [50] Improving Energy Efficiency in Egyptian Airports: A Case Study of Sharm-Elshiekh Airport
    Shafei, Mohamed
    Tawfik, Mohamed
    Khalil, Doaa
    2019 21ST INTERNATIONAL MIDDLE EAST POWER SYSTEMS CONFERENCE (MEPCON 2019), 2019, : 289 - 294