Training Set Similarity Based Parameter Selection for Statistical Machine Translation

被引:0
|
作者
Shi, Xuewen [1 ]
Huang, Heyan [1 ]
Jian, Ping [1 ]
Tang, Yi-Kun [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing Engn Res Ctr High Volume Language Informa, Beijing 100081, Peoples R China
来源
WEB AND BIG DATA (APWEB-WAIM 2018), PT I | 2018年 / 10987卷
基金
中国国家自然科学基金;
关键词
Statistical machine translation; Log-linear model; Parameter selection;
D O I
10.1007/978-3-319-96890-2_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Log-linear model based statistical machine translation systems (SMT) are usually composed of multiple feature functions. Each feature function is assigned a weight as a model parameter. In this paper, we consider that different input source sentences may have discrepant needs for model parameters. To adapt the model to different inputs, we propose a model parameters selection method for log-linear model based SMT systems. The method is mainly based on the characteristics of different feature functions themselves without any assumption on unseen test sets. Experimental results on two language pairs (Zh-En and Ug-Zh) show that our method leads to the improvements up to 2.4 and 2.2 BLEU score respectively, and it also shows the good interpretability of our proposed method.
引用
收藏
页码:63 / 71
页数:9
相关论文
共 50 条
  • [11] Parameter selection of Gaussian kernel SVM based on local density of training set
    Yang, Jiawei
    Wu, Zeping
    Peng, Ke
    Okolo, Patrick N.
    Zhang, Weihua
    Zhao, Hailong
    Sun, Jingbo
    INVERSE PROBLEMS IN SCIENCE AND ENGINEERING, 2021, 29 (04) : 536 - 548
  • [12] Using Statistical Machine Translation to Grade Training Data
    Finch, Andrew
    Sumita, Eiichiro
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 113 - 119
  • [13] Minimum error rate training in statistical machine translation
    Och, FJ
    41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 160 - 167
  • [14] Beyond BLEU: Training Neural Machine Translation with Semantic Similarity
    Wieting, John
    Berg-Kirkpatrick, Taylor
    Gimpel, Kevin
    Neubig, Graham
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4344 - 4355
  • [15] A Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation
    Zhang, Jingyi
    Utiyama, Masao
    Sumita, Eiichro
    Neubig, Graham
    Nakamura, Satoshi
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1372 - 1381
  • [16] Vector sentences representation for data selection in statistical machine translation
    Chinea-Rios, Mara
    Sanchis-Trilles, German
    Casacuberta, Francisco
    COMPUTER SPEECH AND LANGUAGE, 2019, 56 : 1 - 16
  • [17] An Empirical Analysis of Data Selection Techniques in Statistical Machine Translation
    Chinea-Rios, Mara
    Sanchis-Triches, German
    Casacuberta, Francisco
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2015, (55): : 101 - 108
  • [18] Survey of data-selection methods in statistical machine translation
    Eetemadi, Sauleh
    Lewis, William
    Toutanova, Kristina
    Radha, Hayder
    MACHINE TRANSLATION, 2015, 29 (3-4) : 189 - 223
  • [19] Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
    Marie, Benjamin
    Fujita, Atsushi
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (05)
  • [20] Discriminative training and maximum entropy models for statistical machine translation
    Och, FJ
    Ney, H
    40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 295 - 302