North Korean Neural Machine Translation through South Korean Resources

被引:0
|
作者
Kim, Hwichan [1 ]
Tosho, Hirasawa [1 ]
Moon, Sangwhan [2 ]
Okazaki, Naoaki [2 ]
Komachi, Mamoru [1 ]
机构
[1] Tokyo Metropolitan Univ, 6-6 Asahigaoka, Hino, Tokyo 1910065, Japan
[2] Tokyo Inst Technol, 2-12-1 Ookayama, Meguro, Tokyo 1528550, Japan
关键词
Low resource; parallel data construction; pre-process; north korean machine translation;
D O I
10.1145/3608947
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
South and North Korea both use the Korean language. However, Korean natural language processing (NLP) research has mostly focused on South Korean language. Therefore, existing NLP systems in the Korean language, such as neural machine translation (NMT) systems, cannot properly process North Korean inputs. Training a model using North Korean data is the most straightforward approach to solving this problem, but the data to train NMT models are insufficient. To solve this problem, we constructed a parallel corpus to develop a North Korean NMT model using a comparable corpus. We manually aligned parallel sentences to create evaluation data and automatically aligned the remaining sentences to create training data. We trained a North Korean NMT model using our North Korean parallel data and improved North Korean translation quality using South Korean resources such as parallel data and a pre-trained model. In addition, we propose Korean-specific pre-processing methods, character tokenization, and phoneme decomposition to use the South Korean resources more efficiently. We demonstrate that the phoneme decomposition consistently improves the North Korean translation accuracy compared to other pre-processing methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Channeling Inter-Korean Aid to Targeted Groups: South Korean NGOs in the North Korean Food Crisis
    Fang, Arnold H.
    [J]. NORTH KOREAN REVIEW, 2012, 8 (02) : 50 - 61
  • [42] The Factors Affecting the Development of National Identity as South Korean in North Korean Refugees Living in South Korea
    Yu, Shi-Eun
    Eom, Jin-Sup
    Jeon, Woo-Taek
    [J]. PSYCHIATRY INVESTIGATION, 2012, 9 (03) : 209 - 216
  • [43] Engel slates South Korean machine production
    不详
    [J]. MODERN PLASTICS, 1997, 74 (10): : 20 - 20
  • [44] Engel plans South Korean machine plant
    [J]. Modern Plastics, 1997, 74 (10):
  • [45] Korean Sign Language Translation using Machine Learning
    Caliwag, Angela
    Angsanto, Stephen Ryan
    Lim, Wansu
    [J]. 2018 TENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN 2018), 2018, : 826 - 828
  • [46] A Study on the Use of Machine Translation In Korean Language Education
    Hyungjae, Lim
    Kamei, Flourish
    [J]. JOURNAL OF THE INTERNATIONAL NETWORK FOR KOREAN LANGUAGE AND CULTURE, 2019, 16 (03): : 297 - 315
  • [47] Customizing an English-Korean machine translation system for patent translation
    Choi, Sung-Kwon
    Kim, Young-Gil
    [J]. PACLIC 21 - The 21st Pacific Asia Conference on Language, Information and Computation, Proceedings, 2007, : 105 - 114
  • [48] Customizing an English-Korean Machine Translation System for Patent Translation
    Choi, Sung-Kwon
    Kim, Young-Gil
    [J]. PACLIC 21: THE 21ST PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2007, : 105 - 114
  • [49] South Korean firm wins rights to use North Korean ship repair yard
    不详
    [J]. WELDING JOURNAL, 2008, 87 (05) : 8 - 8
  • [50] Threat, Prejudice, and Stereotyping in the Context of Japanese, North Korean, and South Korean Intergroup Relations
    Myers, Chris
    Abrams, Dominic
    Rosenthal, Harriet E. S.
    Christian, Julie
    [J]. CURRENT RESEARCH IN SOCIAL PSYCHOLOGY, 2013, 20 : 76 - 85