Creation of Comparable Corpora for English-{Urdu, Arabic, Persian}

被引:0
|
作者
Abouammoh, Murad [1 ]
Shah, Kashif [2 ]
Aker, Ahmet [2 ]
机构
[1] King Saud Univ, Riyadh, Saudi Arabia
[2] Univ Sheffield, Sheffield, S Yorkshire, England
关键词
Comparable Corpora for Arabic; Urdu; Persian and English;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, in the case of under-resourced languages or some specific domains, parallel corpora are not readily available. This leads to under-performing machine translation systems in those sparse data settings. To overcome the low availability of parallel resources the machine translation community has recognized the potential of using comparable resources as training data. However, most efforts have been related to European languages and less in middle-east languages. In this study, we report comparable corpora created from news articles for the pair English -{Arabic, Persian, Urdu} languages. The data has been collected over a period of a year, entails Arabic, Persian and Urdu languages. Furthermore using the English as a pivot language, comparable corpora that involve more than one language can be created, e.g. English- Arabic - Persian, English - Arabic - Urdu, English - Urdu - Persian, etc. Upon request the data can be provided for research purposes.
引用
收藏
页码:4193 / 4196
页数:4
相关论文
共 50 条
  • [1] ISNAD Citation Style: Turkish- English- Arabic- Persian
    Demir, Abdullah
    [J]. ESKIYENI, 2024, (52): : 1 - 8
  • [2] Extracting an English-Persian Parallel Corpus from Comparable Corpora
    Karimi, Akbar
    Ansari, Ebrahim
    Bigham, Bahram Sadeghi
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3477 - 3482
  • [3] Topic Based Creation of a Persian-English Comparable Corpus
    Rahimi, Zahra
    Shakery, Azadeh
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 458 - 469
  • [5] DESCRIPTIVE CATALOGUE OF PERSIAN, URDU AND ARABIC MANUSCRIPTS IN DACCA UNIVERSITY LIBRARY .2. URDU AND ARABIC MANUSCRIPTS - HABIBULLAH,ABM
    LUTHER, KA
    [J]. JOURNAL OF ASIAN STUDIES, 1969, 28 (03): : 628 - 629
  • [6] Creating Chinese-English Comparable Corpora
    Huang, Degen
    Wang, Shanshan
    Ren, Fuji
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (08): : 1853 - 1861
  • [7] Creation of a parallel corpora from comparable corpora for the simplification of medical texts in French
    Cardon, Remi
    Grabar, Natalia
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2020, 61 (02): : 15 - 39
  • [8] Creating a Persian-English Comparable Corpus
    Hashemi, Homa Baradaran
    Shakery, Azadeh
    Faili, Heshaam
    [J]. MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS EVALUATION, 2010, 6360 : 27 - 39
  • [9] Building English - Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora
    Kaur, Dilshad
    Singh, Satwinder
    [J]. APPLIED COMPUTER SYSTEMS, 2023, 28 (02) : 245 - 251
  • [10] ARABIC PERSIAN POEMS IN ENGLISH - POUND,OS
    GREET, A
    [J]. CRNLE REVIEWS JOURNAL, 1986, (02): : 100 - 103