Integrated phrase segmentation and alignment algorithm for Statistical Machine Translation

被引:0
|
作者
Zhang, Y [1 ]
Vogel, S [1 ]
Waibel, A [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
phrase alignment; phrase segmentation; statistical machine translation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an integrated phrase segmentation/alignment algorithm (ISA) for Statistical Machine Translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text into phrases as other methods do, this algorithm segments the sentences into phrases and finds their alignments simultaneously. For each sentence pair, ISA builds a two-dimensional matrix to represent a sentence pair where the value of each cell corresponds to the Point-wise Mutual Information (MI) between the source and target words. Based on the similarities of MI values among cells, we identify the aligned phrase pairs. Once all the phrase pairs are found, we know both how to segment one sentence into phrases and also the alignments between the source and target sentences. We use monolingual bigram language models to estimate the Joint probabilities of the identified phrase pairs. The Joint probabilities are then normalized to conditional probabilities, which are used by the decoder. Despite its simplicity, this approach yields phrase-to-phrase translations with significant higher precisions than our baseline system where phrase translations are extracted from the HMM word alignment. When we combine the phrase-to-phrase translations generated by this algorithm with the baseline system, the improvement on translation quality is even larger.
引用
收藏
页码:567 / 573
页数:7
相关论文
共 50 条
  • [1] Phrase Alignment Confidence for Statistical Machine Translation
    Ananthakrishnan, Sankaranarayanan
    Prasad, Rohit
    Natarajan, Prem
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2878 - 2881
  • [2] HMM word and phrase alignment for statistical machine translation
    Deng, Yonggang
    Byrne, William
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 494 - 507
  • [3] Statistical machine translation using hierarchical phrase alignment
    Watanabe, Taro
    Imamura, Kenji
    Sumita, Eiichiro
    Okuno, Hiroshi G.
    [J]. Systems and Computers in Japan, 2007, 38 (06) : 70 - 79
  • [4] Phrase-based alignment models for statistical machine translation
    Tomás, J
    Lloret, J
    Casacuberta, F
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 2, PROCEEDINGS, 2005, 3523 : 605 - 613
  • [5] Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation
    Li, Zezhong
    Ikeda, Hideto
    Fukumoto, Junichi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (07) : 1536 - 1543
  • [6] Improved of phrase extraction algorithm in Tibetan and Chinese statistical machine translation
    [J]. Cao, H., 1600, Asian Network for Scientific Information (13):
  • [7] Neural Machine Translation With Explicit Phrase Alignment
    Zhang, Jiacheng
    Luan, Huanbo
    Sun, Maosong
    Zhai, Feifei
    Xu, Jingfang
    Liu, Yang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1001 - 1010
  • [8] Comparing and integrating alignment template and standard phrase-based statistical machine translation
    Xu, Lin
    Cao, Xiaoguang
    Zhang, Bufeng
    Li, Mu
    [J]. Computational Linguistics and Intelligent Text Processing, 2007, 4394 : 420 - 431
  • [9] Using collocation segmentation to extract translation units in a phrase-based statistical machine translation system
    Costa-jussa, Marta R.
    Daudaravicius, Vidas
    Banchs, Rafael E.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (45): : 215 - 220
  • [10] Phrase-based statistical machine translation
    Zens, R
    Och, FJ
    Ney, H
    [J]. KI2002: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, 2479 : 18 - 32