Modelling lexical redundancy for machine translation

被引:0
|
作者
Talbot, David [1 ]
Osborne, Miles [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh EH8 9LW, Midlothian, Scotland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Certain distinctions made in the lexicon of one language may be redundant when translating into another language. We quantify redundancy among source types by the similarity of their distributions over target types. We propose a language-independent framework for minimising lexical redundancy that can be optimised directly from parallel text. Optimisation of the source lexicon for a given target language is viewed as model selection over a set of cluster-based translation models. Redundant distinctions between types may exhibit monolingual regularities, for example, inflexion patterns. We define a prior over model structure using a Markov random field and learn features over sets of monolingual types that are predictive of bilingual redundancy. The prior makes model selection more robust without the need for language-specific assumptions regarding redundancy. Using these models in a phrase-based SMT system, we show significant improvements in translation quality for certain language pairs.
引用
收藏
页码:969 / 976
页数:8
相关论文
共 50 条
  • [31] Machine translation-based bug localization technique for bridging lexical gap
    Xiao, Yan
    Keung, Jacky
    Bennin, Kwabena E.
    Mi, Qing
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 99 : 58 - 61
  • [32] Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation
    Chen, Guanhua
    Chen, Yun
    Wang, Yong
    Li, Victor O. K.
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3587 - 3593
  • [33] Lexical Chunks and Translation
    Liu, Fangrong
    [J]. PROCEEDINGS OF THE 2015 2ND INTERNATIONAL CONFERENCE ON EDUCATION, LANGUAGE, ART AND INTERCULTURAL COMMUNICATION, 2016, 37 : 234 - 236
  • [34] Multimodal Lexical Translation
    Lala, Chiraag
    Specia, Lucia
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3810 - 3817
  • [35] FREQUENCY AND NEIGHBORHOOD EFFECTS ON LEXICAL ACCESS - LEXICAL SIMILARITY OR ORTHOGRAPHIC REDUNDANCY
    ANDREWS, S
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 1992, 18 (02) : 234 - 254
  • [36] Translation of Untranslatable Words - Integration of Lexical Approximation and Phrase-Table Extension Techniques into Statistical Machine Translation
    Paul, Michael
    Arora, Karunesh
    Sumita, Eiichiro
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (12) : 2378 - 2385
  • [37] Translation of untranslatable words - Integration of lexical approximation and phrase-table extension techniques into statistical machine translation
    NICT, Kyoto-fu 619-0289, Japan
    不详
    [J]. IEICE Trans Inf Syst, 12 (2378-2385):
  • [38] Modelling Early Word Acquisition through Multiplex Lexical Networks and Machine Learning
    Stella, Massimo
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2019, 3 (01) : 1 - 22
  • [39] Document-Level Machine Translation Evaluation Metrics Enhanced with Simplified Lexical Chain
    Gong, Zhengxian
    Zhou, Guodong
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2015, 2015, 9362 : 396 - 403
  • [40] Very Large-Scale Lexical Resources to Enhance Chinese and Japanese Machine Translation
    Halpern, Jack
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 857 - 861