Data Categorization and Model Weighting Approach for Language Model Adaptation in Statistical Machine Translation

被引:0
|
作者
AbuHamad, Mohammed [1 ]
Mohd, Masnizah [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi Selangor, Malaysia
关键词
Language model adaptation; statistical machine translation; clustering;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Language model encapsulates semantic, syntactic and pragmatic information about specific task. Intelligent systems especially natural language processing systems can show different results in terms of performance and precision when moving among genres and domains. Therefore researchers have explored different language model adaptation strategies in order to overcome effectiveness issue. There are two main categories in language model adaptation techniques. The first category includes the techniques that based on the data selection where task-oriented corpus can be extracted and used to train and generate models for specific translations. While the second category focuses on developing a weighting criterion to assign the test data to specific model corpus. The purpose of this research is to introduce language model adaptation approach that combines both categories (data selection and weighting criterion) of language model adaptation. This approach applies data selection for specific-task translations by dividing the corpus into smaller and topic-related corpora using clustering process. We investigate the effect of different approaches for clustering the bilingual data on the language model adaptation process in terms of translation quality using the Europarl corpus WMT07 that includes bilingual data for English-Spanish, English-German and English-French. A mixture of language models should assign any given data to the right language model to be used in the translation process using a specific weighting criterion. The proposed language model adaptation has achieved better translation quality compare to the baseline model in Statistical Machine Translation (SMT).
引用
收藏
页码:135 / 141
页数:7
相关论文
共 50 条
  • [1] RESAMPLING AUXILIARY DATA FOR LANGUAGE MODEL ADAPTATION IN MACHINE TRANSLATION FOR SPEECH
    Maskey, Sameer
    Sethy, Abhinav
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4817 - +
  • [2] NAME-AWARE LANGUAGE MODEL ADAPTATION AND SPARSE FEATURES FOR STATISTICAL MACHINE TRANSLATION
    Wang, Wen
    Li, Haibo
    Ji, Heng
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 324 - 330
  • [3] Language model adaptation in machine translation from speech
    Bulyko, Ivan
    Matsoukas, Spyros
    Schwartz, Richard
    Nguyen, Long
    Makhoul, John
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 117 - +
  • [4] Statistical Machine Translation as a Language Model for Handwriting Recognition
    Devlin, Jacob
    Kamali, Matin
    Subramanian, Krishna
    Prasad, Rohit
    Natarajan, Prem
    [J]. 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 291 - 296
  • [5] Semi-supervised model adaptation for statistical machine translation
    Ueffing, Nicola
    Haffari, Gholamreza
    Sarkar, Anoop
    [J]. MACHINE TRANSLATION, 2007, 21 (02) : 77 - 94
  • [6] Making Language Model as Small as Possible in Statistical Machine Translation
    Liu, Yang
    Zhang, Jiajun
    Hao, Jie
    Zhang, Dakun
    [J]. MACHINE TRANSLATION, CWMT 2014, 2014, 493 : 1 - 12
  • [7] Utilizing Language Model for Term Weighting in Text Categorization
    Coban, Onder
    Ozel, Selma Ayse
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [8] Syntactic discriminative language model rerankers for statistical machine translation
    Carter, Simon
    Monz, Christof
    [J]. MACHINE TRANSLATION, 2011, 25 (04) : 317 - 339
  • [9] An Approach to N-Gram Language Model Evaluation in Phrase-Based Statistical Machine Translation
    Su, Jinsong
    Liu, Qun
    Dong, Huailin
    Chen, Yidong
    Shi, Xiaodong
    [J]. 2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 201 - 204
  • [10] Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation
    Wang, Rui
    Zhao, Hai
    Lu, Bao-Liang
    Utiyama, Masao
    Sumita, Eiichiro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (07) : 1209 - 1220