Statistical Machine Translation from and into Morphologically Rich and Low Resourced Languages

被引:6
|
作者
Pushpananda, Randil [1 ]
Weerasinghe, Ruvan [1 ]
Niranjan, Mahesan [2 ]
机构
[1] Univ Colombo, Sch Comp, Language Technol Res Lab, Colombo, Sri Lanka
[2] Univ Southampton, Sch Elect & Comp Sci, Southampton SO17 1BJ, Hants, England
关键词
D O I
10.1007/978-3-319-18111-0_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider the challenging problem of automatic machine translation between a language pair which is both morphologically rich and low resourced: Sinhala and Tamil. We build a phrase based Statistical Machine Translation (SMT) system and attempt to enhance it by unsupervised morphological analysis. When translating across this pair of languages, morphological changes result in large numbers of out-of-vocabulary (OOV) terms between training and test sets leading to reduced BLEU scores in evaluation. This early work shows that unsupervised morphological analysis using the Morfessor algorithm, extracting morpheme-like units is able to significantly reduce the OOV problem and help in improved translation.
引用
收藏
页码:545 / 556
页数:12
相关论文
共 50 条
  • [1] Using POS information for statistical machine translation into morphologically rich languages
    Ueffing, N
    Ney, H
    [J]. EACL 2003: 10TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 347 - 354
  • [2] A Study of Statistical Machine Translation Methods for Under Resourced Languages
    Pa, Win Pa
    Thu, Ye Kyaw
    Finch, Andrew
    Sumita, Eiichiro
    [J]. SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 250 - 257
  • [3] Neural Machine Translation for Low-Resourced Indian Languages
    Choudhary, Himanshu
    Rao, Shivansh
    Rohilla, Rajesh
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3610 - 3615
  • [4] Improved Unsupervised Neural Machine Translation with Semantically Weighted Back Translation for Morphologically Rich and Low Resource Languages
    Chauhan, Shweta
    Saxena, Shefali
    Daniel, Philemon
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1707 - 1726
  • [5] Improved Unsupervised Neural Machine Translation with Semantically Weighted Back Translation for Morphologically Rich and Low Resource Languages
    Shweta Chauhan
    Shefali Saxena
    Philemon Daniel
    [J]. Neural Processing Letters, 2022, 54 : 1707 - 1726
  • [6] Addressing data sparsity for neural machine translation between morphologically rich languages
    Garcia-Martinez, Mercedes
    Aransa, Walid
    Bougares, Fethi
    Barrault, Loic
    [J]. MACHINE TRANSLATION, 2020, 34 (01) : 1 - 20
  • [7] End-to-End Lexically Constrained Machine Translation for Morphologically Rich Languages
    Jon, Josef
    Aires, Joao Paulo
    Varis, Dusan
    Bojar, Ondrej
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4019 - 4033
  • [8] Multilingual Neural Machine Translation for Low Resourced Languages: Ometo-English
    Yigezu, Mesay Gemeda
    Woldeyohannis, Michael Melese
    Tonja, Atnafu Lambebo
    [J]. 2021 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR DEVELOPMENT FOR AFRICA (ICT4DA), 2021, : 89 - 94
  • [9] Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Translation System
    El-Kahlout, Ilknur Durgar
    Bektas, Emre
    Erdem, Naime Seyma
    Kaya, Hamza
    [J]. FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 158 - 166
  • [10] Training and Adapting Multilingual NMT for Less-resourced and Morphologically Rich Languages
    Rikters, Matiss
    Pinnis, Marcis
    Krislauks, Rihards
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3766 - 3773