Evaluation of Context-dependent Phrasal Translation Lexicons for Statistical Machine Translation

被引:0
|
作者
Carpuat, Marine [1 ]
Wu, Dekai [1 ]
机构
[1] Univ Sci & Technol, Dept Comp Sci & Engn, Human Language Technol Ctr, HKUST, Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We present new direct data analysis showing that dynamically-built context-dependent phrasal translation lexicons are more useful resources for phrase-based statistical machine translation (SMT) than conventional static phrasal translation lexicons, which ignore all contextual information. After several years of surprising negative results, recent work suggests that context-dependent phrasal translation lexicons are an appropriate framework to successfully incorporate Word Sense Disambiguation (WSD) modeling into SMT. However, this approach has so far only been evaluated using automatic translation quality metrics, which are important, but aggregate many different factors. A direct analysis is still needed to understand how context-dependent phrasal translation lexicons impact translation quality, and whether the additional complexity they introduce is really necessary. In this paper, we focus on the impact of context-dependent translation lexicons on lexical choice in phrase-based SMT and show that context-dependent lexicons are more useful to a phrase-based SMT system than a conventional lexicon. A typical phrase-based SMT system makes use of more and longer phrases with context modeling, including phrases that were not seen very frequently in training. Even when the segmentation is identical, the context-dependent lexicons yields translations that match references more often than conventional lexicons.
引用
收藏
页码:3520 / 3527
页数:8
相关论文
共 50 条
  • [41] HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation
    Thompson, Brian
    Knowles, Rebecca
    Zhang, Xuan
    Khayrallah, Huda
    Duh, Kevin
    Koehn, Philipp
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1382 - 1387
  • [42] Unsupervised Statistical Machine Translation
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3632 - 3642
  • [43] Discourse in Statistical Machine Translation
    Hardmeier, Christian
    DISCOURS-REVUE DE LINGUISTIQUE PSYCHOLINGUISTIQUE ET INFORMATIQUE, 2012, (11):
  • [44] A SomAgent statistical machine translation
    Lopez, V. F.
    Corchado, J. M.
    De Paz, J. F.
    Rodriguez, S.
    Bajo, J.
    APPLIED SOFT COMPUTING, 2011, 11 (02) : 2925 - 2933
  • [45] A critique of Statistical Machine Translation
    Way, Andy
    LINGUISTICA ANTVERPIENSIA NEW SERIES-THEMES IN TRANSLATION STUDIES, 2009, 8 : 17 - 41
  • [46] Statistical Machine Translation Context Modelling with Recurrent Neural Network and LDA
    Alsenan, Shrooq
    Ykhlef, Mourad
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 75 - 84
  • [47] Using a Bilingual Context in Word-Based Statistical Machine Translation
    Schmidt, Christoph
    Vilar, David
    Ney, Herrnann
    PATTERN RECOGNITION IN INFORMATION SYSTEMS, PROCEEDINGS, 2008, : 144 - 153
  • [48] Target-Side Context for Discriminative Models in Statistical Machine Translation
    Tamchyna, Ales
    Fraser, Alexander
    Bojar, Ondrej
    Junczys-Dowmunt, Marcin
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1704 - 1714
  • [49] Putting Evaluation in Context: Contextual Embeddings improve Machine Translation Evaluation
    Mathur, Nitika
    Baldwin, Timothy
    Cohn, Trevor
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2799 - 2808
  • [50] Using a Grammer Checker for Evaluation and Postprocessing of Statistical Machine Translation
    Stymne, Sara
    Ahrenberg, Lars
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,