Linguistic Divergence of Sinhala and Tamil languages in Machine Translation

被引:0
|
作者
Dilshani, W. S. N. [1 ]
Yashothara, S. [1 ]
Uthayasanker, R. T. [1 ]
Jayasena, S. [1 ]
机构
[1] Univ Moratuwa, Dept Comp Sci & Engn, Moratuwa, Sri Lanka
关键词
Language Divergence; Sinhala; Tamil; Dorr's classification; NLP; translation challenges;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a study of the lexical-semantic divergence between Sinhala and Tamil languages. Study of divergence is critical as differences in linguistic and extra-linguistic features in languages play pivotal roles in translation. This research the first study of the divergence between Sinhala and Tamil languages and is based on Dorr's classification. We propose a computer-assisted divergence study procedure using statistical machine translation, which is easy and gives good performance compared to traditional approaches. Accordingly, this research has the twin aims of revisiting classification of divergence types as outlined by Dorr and outlining some of the new divergence patterns specific to Sinhala and Tamil languages. This study proposes a rule-based algorithm to classify a divergence.
引用
收藏
页码:13 / 18
页数:6
相关论文
共 50 条
  • [1] Neural Machine Translation for Sinhala and Tamil Languages
    Tennage, Pasindu
    Sandaruwan, Prabath
    Thilakarathne, Malith
    Herath, Achini
    Ranathunga, Surangika
    Jayasena, Sanath
    Dias, Gihan
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 189 - 192
  • [2] Exploring Neural Machine Translation for Sinhala-Tamil Languages Pair
    Nissanka, L. N. A. S. H.
    Pushpananda, B. H. R.
    Weerasinghe, A. R.
    2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 202 - 207
  • [3] Towards Sinhala Tamil Machine Translation
    Pushpananda, Randil
    Weerasinghe, Ruvan
    Niranjan, Mahesan
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2013, : 288 - 288
  • [4] Real-time Direct Translation System for Sinhala and Tamil Languages
    Rajpirathap, S.
    Sheeyam, S.
    Umasuthan, K.
    Chelvarajah, Amalraj
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 1437 - 1443
  • [5] Si-Ta: Machine Translation of Sinhala and Tamil Official Documents
    Ranathunga, Surangika
    Farhath, Fathima
    Thayasivam, Uthayasanker
    Jayasena, Sanath
    Dias, Gihan
    2018 NATIONAL INFORMATION TECHNOLOGY CONFERENCE (NITC), 2018,
  • [6] Transliteration and Byte Pair Encoding to Improve Tamil to Sinhala Neural Machine Translation
    Tennage, Pasindu
    Herath, Achini
    Thilakarathne, Malith
    Sandaruwan, Prabath
    Ranathunga, Surangika
    2018 MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON) 4TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2018, : 390 - 395
  • [7] Integration of Bilingual Lists for Domain-Specific Statistical Machine Translation for Sinhala-Tamil
    Farhath, Fathima
    Ranathunga, Surangika
    Jayasena, Sanath
    Dias, Gihan
    2018 MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON) 4TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2018, : 538 - 543
  • [8] English to Sinhala Neural Machine Translation
    Fonseka, Thilakshi
    Naranpanawa, Rashmini
    Perera, Ravinga
    Thayasivam, Uthayasanker
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 305 - 309
  • [9] Handling Rare Word Problem using Synthetic Training Data for Sinhala and Tamil Neural Machine Translation
    Tennage, Pasindu
    Sandaruwan, Prabath
    Thilakarathne, Malith
    Herath, Achini
    Ranathunga, Surangika
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1649 - 1653
  • [10] A parser for Sinhala language - First step towards English to Sinhala machine translation
    Hettige, B.
    Karunananda, A. S.
    2006 International Conference on Industrial and Information Systems, Vols 1 and 2, 2006, : 583 - 587