Recursive alignment block classification technique for word reordering in statistical machine translation

被引:0
|
作者
Marta R. Costa-jussà
José A. R. Fonollosa
Enric Monte
机构
[1] Barcelona Media Innovation Center,
[2] Universitat Politècnica de Catalunya,undefined
[3] TALP Research Center,undefined
来源
关键词
Statistical machine translation; Word reordering; Statistical classification; Automatic evaluation;
D O I
暂无
中图分类号
学科分类号
摘要
Statistical machine translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are assumed to be capable of learning reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. In this paper, we show that SMT can take advantage of inductive learning in order to solve reordering problems. Given a word alignment, we identify those pairs of consecutive source blocks (sequences of words) whose translation is swapped, i.e. those blocks which, if swapped, generate a correct monotonic translation. Afterwards, we classify these pairs into groups, following recursively a co-occurrence block criterion, in order to infer reorderings. Inside the same group, we allow new internal combination in order to generalize the reorder to unseen pairs of blocks. Then, we identify the pairs of blocks in the source corpora (both training and test) which belong to the same group. We swap them and we use the modified source training corpora to realign and to build the final translation system. We have evaluated our reordering approach both in alignment and translation quality. In addition, we have used two state-of-the-art SMT systems: a Phrased-based and an Ngram-based. Experiments are reported on the EuroParl task, showing improvements almost over 1 point in the standard MT evaluation metrics (mWER and BLEU).
引用
收藏
页码:165 / 179
页数:14
相关论文
共 50 条
  • [1] Recursive alignment block classification technique for word reordering in statistical machine translation
    Costa-jussa, Marta R.
    Fonollosa, Jose A. R.
    Monte, Enric
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2011, 45 (02) : 165 - 179
  • [2] Using Reordering in Statistical Machine Translation based on Recursive Alignment Block Classification
    Costa-Jussa, Marta R.
    Fonollosa, Jose A. R.
    Monte, Enric
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1749 - 1754
  • [3] WORD REORDERING ALIGNMENT FOR COMBINATION OF STATISTICAL MACHINE TRANSLATION SYSTEMS
    Li, Maoxi
    Zong, Chengqing
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 273 - 276
  • [4] A Novel Word Reordering Method for Statistical Machine Translation
    Zang, Shuo
    Zhao, Hai
    Wu, Chunyang
    Wang, Rui
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 843 - 848
  • [5] Lexicalized Syntactic Reordering Framework for Word Alignment and Machine Translation
    Huang, Chung-chi
    Chen, Wei-teh
    Chang, Jason S.
    [J]. COMPUTER PROCESSING OF ORIENTAL LANGUAGES: LANGUAGE TECHNOLOGY FOR THE KNOWLEDGE-BASED ECONOMY, 2009, 5459 : 103 - 111
  • [6] Statistical machine translation decoding using target word reordering
    Tomás, J
    Casacuberta, F
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 734 - 743
  • [7] Word Reordering Approaches for Bangla-English Statistical Machine Translation
    Roy, Maxim
    Popowich, Fred
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2010, 6085 : 282 - 285
  • [8] Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment
    Carpuat, Marine
    Marton, Yuval
    Habash, Nizar
    [J]. MACHINE TRANSLATION, 2012, 26 (1-2) : 105 - 120
  • [9] Measuring word alignment quality for statistical machine translation
    Fraser, Alexander
    Marcu, Daniel
    [J]. COMPUTATIONAL LINGUISTICS, 2007, 33 (03) : 293 - 303
  • [10] HMM word and phrase alignment for statistical machine translation
    Deng, Yonggang
    Byrne, William
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 494 - 507