Data Categorization and Model Weighting Approach for Language Model Adaptation in Statistical Machine Translation

被引:0
|
作者
AbuHamad, Mohammed [1 ]
Mohd, Masnizah [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi Selangor, Malaysia
关键词
Language model adaptation; statistical machine translation; clustering;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Language model encapsulates semantic, syntactic and pragmatic information about specific task. Intelligent systems especially natural language processing systems can show different results in terms of performance and precision when moving among genres and domains. Therefore researchers have explored different language model adaptation strategies in order to overcome effectiveness issue. There are two main categories in language model adaptation techniques. The first category includes the techniques that based on the data selection where task-oriented corpus can be extracted and used to train and generate models for specific translations. While the second category focuses on developing a weighting criterion to assign the test data to specific model corpus. The purpose of this research is to introduce language model adaptation approach that combines both categories (data selection and weighting criterion) of language model adaptation. This approach applies data selection for specific-task translations by dividing the corpus into smaller and topic-related corpora using clustering process. We investigate the effect of different approaches for clustering the bilingual data on the language model adaptation process in terms of translation quality using the Europarl corpus WMT07 that includes bilingual data for English-Spanish, English-German and English-French. A mixture of language models should assign any given data to the right language model to be used in the translation process using a specific weighting criterion. The proposed language model adaptation has achieved better translation quality compare to the baseline model in Statistical Machine Translation (SMT).
引用
收藏
页码:135 / 141
页数:7
相关论文
共 50 条
  • [41] Machine Translation Based on Domain Adaptive Language Model
    Li, Lingling
    Chen, Xianlong
    Xu, Yiling
    [J]. 2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 116 - 120
  • [42] Impact of Statistical Language Model on Example Based Machine Translation System between Kazakh and Turkish Languages
    Kessikbayeva, Gulshat
    Cicekli, Ilyas
    [J]. 2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 112 - 118
  • [43] Statistical machine translation into a morphologically complex language
    Oflazer, Kemal
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 376 - 387
  • [44] A Context-Aware Topic Model for Statistical Machine Translation
    Su, Jinsong
    Xiong, Deyi
    Liu, Yang
    Han, Xianpei
    Lin, Hongyu
    Yao, Junfeng
    Zhang, Min
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 229 - 238
  • [45] Towards a model of statistical machine translation Arabic-French
    Bacha, Khaireddine
    Zrigui, Mounir
    [J]. 2014 WORLD CONGRESS ON COMPUTER APPLICATIONS AND INFORMATION SYSTEMS (WCCAIS), 2014,
  • [46] Exploring Diverse Features for Statistical Machine Translation Model Pruning
    Tu, Mei
    Zhou, Yu
    Zong, Chengqing
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1847 - 1857
  • [47] Synonym-Based Reordering Model for Statistical Machine Translation
    Yang, Zhenxin
    Li, Miao
    Chen, Lei
    Sun, Kai
    [J]. INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2016, PT III, 2016, 9773 : 369 - 378
  • [48] A Maximum-Entropy Segmentation Model for Statistical Machine Translation
    Xiong, Deyi
    Zhang, Min
    Li, Haizhou
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (08): : 2494 - 2505
  • [49] Context Sensitive Word Deletion Model for Statistical Machine Translation
    Li, Qiang
    Han, Yaqian
    Xiao, Tong
    Zhu, Jingbo
    [J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 73 - 84
  • [50] A Clustered Global Phrase Reordering Model for Statistical Machine Translation
    Nagata, Masaaki
    Saito, Kuniko
    Yamamoto, Kazuhide
    Ohashi, Kazuteru
    [J]. COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 713 - 720