Ancient Korean Neural Machine Translation

被引:15
|
作者
Park, Chanjun [1 ]
Lee, Chanhee [1 ,2 ]
Yang, Yeongwook [3 ]
Lim, Heuiseok [1 ]
机构
[1] Korea Univ, Dept Comp Sci & Engn, Seoul 02841, South Korea
[2] Amazon Alexa AI, Seattle, WA 98109 USA
[3] Univ Tartu, Inst Educ, Ctr Educ Technol, EE-50090 Tartu, Estonia
基金
新加坡国家研究基金会;
关键词
Ancient Korean translation; neural machine translation; transformer; subword tokenization; share vocabulary and entity restriction byte pair encoding; MUMMIES;
D O I
10.1109/ACCESS.2020.3004879
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Translation of the languages of ancient times can serve as a source for the content of various digital media and can be helpful in various fields such as natural phenomena, medicine, and science. Owing to these needs, there has been a global movement to translate ancient languages, but expert minds are required for this purpose. It is difficult to train language experts, and more importantly, manual translation is a slow process. Consequently, the recovery of ancient characters using machine translation has been recently investigated, but there is currently no literature on the machine translation of ancient Korean. This paper proposes the first ancient Korean neural machine translation model using a Transformer. This model can improve the efficiency of a translator by quickly providing a draft translation for a number of untranslated ancient documents. Furthermore, a new subword tokenization method called the Share Vocabulary and Entity Restriction Byte Pair Encoding is proposed based on the characteristics of ancient Korean sentences. This proposed method yields an increase in the performance of the original conventional subword tokenization methods such as byte pair encoding by 5.25 BLEU points. In addition, various decoding strategies such as n-gram blocking and ensemble models further improve the performance by 2.89 BLEU points. The model has been made publicly available as a software application.
引用
收藏
页码:116617 / 116625
页数:9
相关论文
共 50 条
  • [31] Korean Morphological Analysis for Korean-Vietnamese Statistical Machine Translation
    Quang-Phuoc Nguyen
    Joon-Choul Shin
    Cheol-Young Ock
    [J]. Journal of Electronic Science and Technology, 2017, 15 (04) : 413 - 419
  • [32] Customizing an English-Korean Machine Translation System for Patent Translation
    Choi, Sung-Kwon
    Kim, Young-Gil
    [J]. PACLIC 21: THE 21ST PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2007, : 105 - 114
  • [33] Generalizing Back-Translation in Neural Machine Translation
    Graca, Miguel
    Kim, Yunsu
    Schamper, Julian
    Khadivi, Shahram
    Ney, Hermann
    [J]. FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS, 2019, : 45 - 52
  • [34] Neural Machine Translation for Amharic-English Translation
    Gezmu, Andargachew Mekonne
    Nuernberger, Andreas
    Bati, Tesfaye Bayu
    [J]. ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 526 - 532
  • [35] Graph Based Translation Memory for Neural Machine Translation
    Xia, Mengzhou
    Huang, Guoping
    Liu, Lemao
    Shi, Shuming
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7297 - 7304
  • [36] Survey on Neural Machine Translation for multilingual translation system
    Basmatkar, Pranjali
    Holani, Hemant
    Kaushal, Shivani
    [J]. PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 443 - 448
  • [37] The Impact of Named Entity Translation for Neural Machine Translation
    Yan, Jinghui
    Zhang, Jiajun
    Xu, JinAn
    Zong, Chengqing
    [J]. MACHINE TRANSLATION, CWMT 2018, 2019, 954 : 63 - 73
  • [38] Integrating Prior Translation Knowledge Into Neural Machine Translation
    Chen, Kehai
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 330 - 339
  • [39] Iterative Back-Translation for Neural Machine Translation
    Vu Cong Duy Hoang
    Koehn, Philipp
    Haffari, Gholamreza
    Cohn, Trevor
    [J]. NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 18 - 24
  • [40] Encoding Gated Translation Memory into Neural Machine Translation
    Cao, Qian
    Xiong, Deyi
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3042 - 3047