Ancient Korean Neural Machine Translation

被引:15
|
作者
Park, Chanjun [1 ]
Lee, Chanhee [1 ,2 ]
Yang, Yeongwook [3 ]
Lim, Heuiseok [1 ]
机构
[1] Korea Univ, Dept Comp Sci & Engn, Seoul 02841, South Korea
[2] Amazon Alexa AI, Seattle, WA 98109 USA
[3] Univ Tartu, Inst Educ, Ctr Educ Technol, EE-50090 Tartu, Estonia
基金
新加坡国家研究基金会;
关键词
Ancient Korean translation; neural machine translation; transformer; subword tokenization; share vocabulary and entity restriction byte pair encoding; MUMMIES;
D O I
10.1109/ACCESS.2020.3004879
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Translation of the languages of ancient times can serve as a source for the content of various digital media and can be helpful in various fields such as natural phenomena, medicine, and science. Owing to these needs, there has been a global movement to translate ancient languages, but expert minds are required for this purpose. It is difficult to train language experts, and more importantly, manual translation is a slow process. Consequently, the recovery of ancient characters using machine translation has been recently investigated, but there is currently no literature on the machine translation of ancient Korean. This paper proposes the first ancient Korean neural machine translation model using a Transformer. This model can improve the efficiency of a translator by quickly providing a draft translation for a number of untranslated ancient documents. Furthermore, a new subword tokenization method called the Share Vocabulary and Entity Restriction Byte Pair Encoding is proposed based on the characteristics of ancient Korean sentences. This proposed method yields an increase in the performance of the original conventional subword tokenization methods such as byte pair encoding by 5.25 BLEU points. In addition, various decoding strategies such as n-gram blocking and ensemble models further improve the performance by 2.89 BLEU points. The model has been made publicly available as a software application.
引用
收藏
页码:116617 / 116625
页数:9
相关论文
共 50 条
  • [1] Priming Ancient Korean Neural Machine Translation
    Park, Chanjun
    Lee, Seolhwa
    Seo, Jaehyung
    Moon, Hyeonseok
    Eo, Sugyeong
    Lim, Heuiseok
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 22 - 28
  • [2] North Korean Neural Machine Translation through South Korean Resources
    Kim, Hwichan
    Tosho, Hirasawa
    Moon, Sangwhan
    Okazaki, Naoaki
    Komachi, Mamoru
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
  • [3] Korean Neural Machine Translation Using Hierarchical Word Structure
    Park, Jeonghyeok
    Zhao, Hai
    [J]. 2020 International Conference on Asian Language Processing, IALP 2020, 2020, : 294 - 298
  • [4] Korean Neural Machine Translation Using Hierarchical Word Structure
    Park, Jeonghyeok
    Zhao, Hai
    [J]. 2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 294 - 298
  • [5] Context-Aware Neural Machine Translation for Korean Honorific Expressions
    Hwang, Yongkeun
    Kim, Yanghoon
    Jung, Kyomin
    [J]. ELECTRONICS, 2021, 10 (13)
  • [6] Neural Machine Translation Strategies for Generating Honorific-style Korean
    Wang, Lijie
    Tu, Mei
    Zhai, Mengxia
    Wang, Huadong
    Liu, Song
    Kim, Sang Ha
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 450 - 455
  • [7] Improving the Performance of Vietnamese-Korean Neural Machine Translation with Contextual Embedding
    Vu, Van-Hai
    Nguyen, Quang-Phuoc
    Tunyan, Ebipatei Victoria
    Ock, Cheol-Young
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (23):
  • [8] Effect of Word Sense Disambiguation on Neural Machine Translation: A Case Study in Korean
    Quang-Phuoc Nguyen
    Anh-Dung Vo
    Shin, Joon-Choul
    Ock, Cheol-Young
    [J]. IEEE ACCESS, 2018, 6 : 38512 - 38523
  • [9] Korean-Vietnamese Neural Machine Translation System With Korean Morphological Analysis and Word Sense Disambiguation
    Quang-Phuoc Nguyen
    Vo, Anh-Dung
    Shin, Joon-Choul
    Phuoc Tran
    Ock, Cheol-Young
    [J]. IEEE ACCESS, 2019, 7 : 32602 - 32616
  • [10] Neural Machine Translation
    Birch, Alexandra
    [J]. NATURAL LANGUAGE ENGINEERING, 2021, 27 (03) : 377 - 378