Code-switching input for machine translation: a case study of Vietnamese-English data

被引:3
|
作者
Nguyen, Li [1 ,2 ]
Mayeux, Oliver [3 ]
Yuan, Zheng [4 ]
机构
[1] Univ Cambridge, Inst Automated Language Teaching & Assessment ALTA, Cambridge CB3 0FD, England
[2] FPT Univ, Linguist & Language Technol Lab, Ho Chi Minh City 721400, Vietnam
[3] Univ Cambridge, Trinity Coll, Cambridge, England
[4] Kings Coll London, Dept Informat, London, England
关键词
Machine translation; code-switching; Vietnamese; human evaluation; automatic evaluation; lexico-semantic enrichment;
D O I
10.1080/14790718.2023.2224013
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Multilingualism presents both a challenge and an opportunity for Natural Language Processing, with code-switching representing a particularly interesting problem for computational models trained on monolingual datasets. In this paper, we explore how code-switched data affects the task of Machine Translation, a task which only recently has started to tackle the challenge of multilingual data. We test three Machine Translation systems on data from the Canberra Vietnamese-English Codeswitching Natural Speech Corpus (CanVEC) and evaluate translation output using both automatic and human metrics. We find that, perhaps counter-intuitively, Machine Translation performs better on code-switching input than monolingual input. In particular, comparison of human and automatic evaluation suggests that codeswitching input may boost the semantic faithfulness of the translation output, an effect we term lexico-semantic enrichment. We also report two cases where this effect is most and least clear in Vietnamese-English, namely gender-neutral 3SG pronouns and interrogative constructions respectively. Overall, we suggest that Machine Translation, and Natural Language Processing more generally, ought to view multilingualism as an opportunity rather than an obstacle.
引用
收藏
页数:22
相关论文
共 50 条
  • [42] A Study on Chinese-English Code-switching from Register Perspective
    Li, Tao
    2018 4TH INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE AND MANAGEMENT (ICSSM 2018), 2018,
  • [43] CODE-SWITCHING IN THE INTERNATIONAL SCHOOLS OF PRISHTINA: A STUDY OF ALBANIAN/ENGLISH BILINGUALISM
    Shabani, Festa
    Munishi, Shkumbin
    Sadiku, Milote
    FOLIA LINGUISTICA ET LITTERARIA, 2022, (40): : 401 - 422
  • [44] Study of initial code-switching development of a Japanese/English bilingual child
    Wanner, PJ
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 28477 - 28477
  • [45] An Empirical Study on Punctuation Restoration for English, Mandarin, and Code-Switching Speech
    Liu, Changsong
    Thi Nga Ho
    Chng, Eng Siong
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 286 - 296
  • [46] Acoustic data augmentation for Mandarin-English code-switching speech recognition
    Long, Yanhua
    Li, Yijie
    Zhang, Qiaozheng
    Wei, Shuang
    Ye, Hong
    Yang, Jichen
    APPLIED ACOUSTICS, 2020, 161
  • [47] TEXTUAL DATA AUGMENTATION FOR ARABIC-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
    Hussein, Amir
    Chowdhury, Shammur Absar
    Abdelali, Ahmed
    Dehak, Najim
    Ali, Ahmed
    Khudanpur, Sanjeev
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 777 - 784
  • [48] Codes across languages: On the translation of literary code-switching
    Ahmed, Mohamed A. H.
    MULTILINGUA-JOURNAL OF CROSS-CULTURAL AND INTERLANGUAGE COMMUNICATION, 2018, 37 (05): : 483 - 514
  • [49] Translation and linguistic code-switching in the music of Yemi Alade
    Lipenga, Timwa
    INTERNATIONAL JOURNAL OF FRANCOPHONE STUDIES, 2021, 24 (3-4) : 221 - 240
  • [50] Code-switching and Translation: Taking Literary Text as an Example
    Shen, Yifu
    PROCEEDINGS OF THE 2018 INTERNATIONAL WORKSHOP ON EDUCATION REFORM AND SOCIAL SCIENCES (ERSS 2018), 2018, 300 : 279 - 282