Machine Translation on a Parallel Code-Switched Corpus

被引:6
|
作者
Menacer, M. A. [1 ]
Langlois, D. [1 ]
Jouvet, D. [1 ]
Fohr, D. [1 ]
Mella, O. [1 ]
Smaili, K. [1 ]
机构
[1] LORIA, Campus Sci,BP 239, F-54506 Vandoeuvre Les Nancy, France
来源
关键词
Code-switching; Machine translation; Statistical machine translation; Neural machine translation;
D O I
10.1007/978-3-030-18305-9_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code-switching (CS) is the phenomenon that occurs when a speaker alternates between two or more languages within an utterance or discourse. In this work, we investigate the existence of code-switching in formal text, namely proceedings of multilingual institutions. Our study is carried out on the Arabic-English code-mixing in a parallel corpus extracted from official documents of United Nations. We build a parallel code-switched corpus with two reference translations one in pure Arabic and the other in pure English. We also carry out a human evaluation of this resource in the aim to use it to evaluate the translation of code-switched documents. To the best of our knowledge, this kind of corpora does not exist. The one we propose is unique. This paper examines several methods to translate code-switched corpus: conventional statistical machine translation, the end-to-end neural machine translation and multitask-learning.
引用
收藏
页码:426 / 432
页数:7
相关论文
共 50 条
  • [1] Exploring Enhanced Code-Switched Noising for Pretraining in Neural Machine Translation
    Iyer, Vivek
    Oncevay, Arturo
    Birch, Alexandra
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 984 - 998
  • [2] An Algerian Arabic-French Code-Switched Corpus
    Cotterell, Ryan
    Renduchintala, Adithya
    Saphra, Naomi
    Callison-Burch, Chris
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [3] An Arabic-Moroccan Darija Code-Switched Corpus
    Samih, Younes
    Maier, Wolfgang
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4170 - 4175
  • [4] DZDC12: a new multipurpose parallel Algerian Arabizi–French code-switched corpus
    Kheireddine Abainia
    [J]. Language Resources and Evaluation, 2020, 54 : 419 - 455
  • [5] From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text
    Tarunesh, Ishan
    Kumar, Syamantak
    Jyothi, Preethi
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3154 - 3169
  • [6] DZDC12: a new multipurpose parallel Algerian Arabizi-French code-switched corpus
    Abainia, Kheireddine
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (02) : 419 - 455
  • [7] ArzEn: A Speech Corpus for Code-switched Egyptian Arabic-English
    Hamed, Injy
    Ngoc Thang Vu
    Abdennadher, Slim
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4237 - 4246
  • [8] Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text
    Gaser, Marwa
    Mager, Manuel
    Hamed, Injy
    Habash, Nizar
    Abdennadher, Slim
    Vu, Ngoc Thang
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3523 - 3538
  • [9] Improving Low Resource Code-switched ASR using Augmented Code-switched TTS
    Sharma, Yash
    Abraham, Basil
    Taneja, Karan
    Jyothi, Preethi
    [J]. INTERSPEECH 2020, 2020, : 4771 - 4775
  • [10] The phonetics of code-switched vowels
    Muldner, Kasia
    Hoiting, Leah
    Sanger, Leyna
    Blumenfeld, Lev
    Toivonen, Ida
    [J]. INTERNATIONAL JOURNAL OF BILINGUALISM, 2019, 23 (01) : 37 - 52