Generation of Compound Words in Statistical Machine Translation into Compounding Languages

被引:5
|
作者
Stymne, Sara [1 ]
Cancedda, Nicola [2 ]
Ahrenberg, Lars [3 ]
机构
[1] Uppsala Univ, Dept Linguist & Philol, S-75126 Uppsala, Sweden
[2] Xerox Res Ctr Europe, F-38240 Meylan, France
[3] Linkoping Univ, Dept Comp & Informat Sci, S-58183 Linkoping, Sweden
关键词
D O I
10.1162/COLI_a_00162
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article we investigate statistical machine translation (SMT) into Germanic languages, with a focus on compound processing. Our main goal is to enable the generation of novel compounds that have not been seen in the training data. We adopt a split-merge strategy, where compounds are split before training the SMT system, and merged after the translation step. This approach reduces sparsity in the training data, but runs the risk of placing translations of compound parts in non-consecutive positions. It also requires a postprocessing step of compound merging, where compounds are reconstructed in the translation output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order and show that it can lead to improvements both by direct inspection and in terms of standard translation evaluation metrics. We also propose several new methods for compound merging, based on heuristics and machine learning, which outperform previously suggested algorithms. These methods can produce novel compounds and a translation with at least the same overall quality as the baseline. For all subtasks we show that it is useful to include part-of-speech based information in the translation process, in order to handle compounds.
引用
收藏
页码:1067 / 1108
页数:42
相关论文
共 50 条
  • [31] Statistical machine translation
    Lopez, Adam
    [J]. ACM COMPUTING SURVEYS, 2008, 40 (03)
  • [32] Statistical Machine Translation
    Cherry, Colin
    [J]. COMPUTATIONAL LINGUISTICS, 2010, 36 (04) : 773 - 776
  • [33] Statistical Machine Translation
    Zhang Xiaojun
    [J]. APPLIED LINGUISTICS, 2011, 32 (03) : 359 - 362
  • [34] Neural Machine Translation Advised by Statistical Machine Translation
    Wang, Xing
    Lu, Zhengdong
    Tu, Zhaopeng
    Li, Hang
    Xiong, Deyi
    Zhang, Min
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3330 - 3336
  • [35] Morphology generation for English-Indian language statistical machine translation
    S. Sreelekha
    [J]. Soft Computing, 2021, 25 : 3657 - 3664
  • [36] Morphology generation for English-Indian language statistical machine translation
    Sreelekha, S.
    [J]. SOFT COMPUTING, 2021, 25 (05) : 3657 - 3664
  • [37] Translation of Untranslatable Words - Integration of Lexical Approximation and Phrase-Table Extension Techniques into Statistical Machine Translation
    Paul, Michael
    Arora, Karunesh
    Sumita, Eiichiro
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (12) : 2378 - 2385
  • [38] A Review on Machine Translation in Indian Languages
    Chopra, Deepti
    Joshi, Nisheeth
    Mathur, Iti
    [J]. ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2018, 8 (05) : 3475 - 3478
  • [39] Neural machine translation of Indian languages
    Revanuru, Karthik
    Turlapaty, Kaushik
    Rao, Shrisha
    [J]. ACM International Conference Proceeding Series, 2017, : 11 - 20
  • [40] Machine translation of very close languages
    Hajic, J
    Hric, J
    Kubon, V
    [J]. 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 7 - 12