Analysing terminology translation errors in statistical and neural machine translation

被引:7
|
作者
Haque, Rejwanul [1 ]
Hasanuzzaman, Mohammed [1 ]
Way, Andy [1 ]
机构
[1] Dublin City Univ, ADAPT Ctr, Sch Comp, Dublin, Ireland
基金
爱尔兰科学基金会;
关键词
Terminology translation; Machine translation; Phrase-based statistical machine translation; Neural machine translation; QUALITY;
D O I
10.1007/s10590-020-09251-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Terminology translation plays a critical role in domain-specific machine translation (MT). Phrase-based statistical MT (PB-SMT) has been the dominant approach to MT for the past 30 years, both in academia and industry. Neural MT (NMT), an end-to-end learning approach to MT, is steadily taking the place of PB-SMT. In this paper, we conduct comparative qualitative evaluation and comprehensive error analysis on terminology translation in PB-SMT and NMT in two translation directions: English-to-Hindi and Hindi-to-English. To the best of our knowledge, there is no gold standard available for evaluating terminology translation quality in MT. For this reason we select an evaluation test set from a legal domain corpus and create a gold standard for evaluating terminology translation in MT. We also propose an error typology taking the terminology translation errors in MT into consideration. We translate sentences of the test set with our MT systems and terminology translations are manually classified as per the error typology. We evaluate the MT system's performance on terminology translation, and demonstrate our findings, unraveling strengths, weaknesses, and similarities of PB-SMT and NMT in the area of term translation.
引用
收藏
页码:149 / 195
页数:47
相关论文
共 50 条
  • [41] Translation Model of Myanmar Phrases for Statistical Machine Translation
    Zin, Thet Thet
    Soe, Khin Mar
    Thein, Ni Lar
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2012, 6839 : 235 - +
  • [42] Unsupervised Statistical Machine Translation
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3632 - 3642
  • [43] Discourse in Statistical Machine Translation
    Hardmeier, Christian
    DISCOURS-REVUE DE LINGUISTIQUE PSYCHOLINGUISTIQUE ET INFORMATIQUE, 2012, (11):
  • [44] A SomAgent statistical machine translation
    Lopez, V. F.
    Corchado, J. M.
    De Paz, J. F.
    Rodriguez, S.
    Bajo, J.
    APPLIED SOFT COMPUTING, 2011, 11 (02) : 2925 - 2933
  • [45] A critique of Statistical Machine Translation
    Way, Andy
    LINGUISTICA ANTVERPIENSIA NEW SERIES-THEMES IN TRANSLATION STUDIES, 2009, 8 : 17 - 41
  • [46] Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation
    Dankers, Verna
    Lucas, Christopher G.
    Titov, Ivan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3608 - 3626
  • [47] A Grammatical Analysis on Machine Translation Errors
    Ge, Shili
    Wu, Susu
    Chen, Xiaoxiao
    Song, Rou
    MACHINE TRANSLATION, CWMT 2018, 2019, 954 : 1 - 14
  • [48] Generalizing Back-Translation in Neural Machine Translation
    Graca, Miguel
    Kim, Yunsu
    Schamper, Julian
    Khadivi, Shahram
    Ney, Hermann
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS, 2019, : 45 - 52
  • [49] Neural Machine Translation for Amharic-English Translation
    Gezmu, Andargachew Mekonne
    Nuernberger, Andreas
    Bati, Tesfaye Bayu
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 526 - 532
  • [50] Graph Based Translation Memory for Neural Machine Translation
    Xia, Mengzhou
    Huang, Guoping
    Liu, Lemao
    Shi, Shuming
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7297 - 7304