Improving statistical word alignments with morpho-syntactic transformations

被引:0
|
作者
de Gispert, Adria [1 ]
Gupta, Deepa
Popovic, Maja
Lambert, Patrik
Marino, Jose B.
Federico, Marcello
Ney, Hermann
Banchs, Rafael
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
[2] Ctr Ric Sci & Tecnol, ITC, IRST, Trento, Italy
[3] Univ Aachen, Rhein Westfal TH Aachen, Lehrstuhl Informat 6, D-5100 Aachen, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lernmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish-English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability.
引用
收藏
页码:368 / 379
页数:12
相关论文
共 50 条