Hybrid data-driven models of machine translation

被引:12
|
作者
Groves, Declan [1 ]
Way, Andy [1 ]
机构
[1] Dublin City Univ, Sch Comp, Dublin 9, Ireland
关键词
Hybrid; Example-based MT; Statistical MT; Statistical language models; Convergence; Chunk coverage; Europarl corpus;
D O I
10.1007/s10590-006-9015-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an extended, harmonised account of our previous work on combining subsentential alignments from phrase-based statistical machine translation (SMT) and example-based MT (EBMT) systems to create novel hybrid data-driven systems capable of outperforming the baseline SMT and EBMT systems from which they were derived. In previous work, we demonstrated that while an EBMT system is capable of outperforming a phrase-based SMT (PBSMT) system constructed from freely available resources, a hybrid 'example-based' SMT system incorporating marker chunks and SMT subsentential alignments is capable of outperforming both baseline translation models for French-English translation. In this paper, we show that similar gains are to be had from constructing a hybrid 'statistical' EBMT system. Unlike the previous research, here we use the Europarl training and test sets, which are fast becoming the standard data in the field. On these data sets, while all hybrid 'statistical' EBMT variants still fall short of the quality achieved by the baseline PBSMT system, we show that adding the marker chunks to create a hybrid `example-based' SMT system outperforms the two baseline systems from which it is derived. Furthermore, we provide further evidence in favour of hybrid systems by adding an SMT target-language model to the EBMT system, and demonstrate that this too has a positive effect on translation quality. We also show that many of the subsentential alignments derived from the Europarl corpus are created by either the PBSMT or the EBMT system, but not by both. In sum, therefore, despite the obvious convergence of the two paradigms, the crucial differences between SMT and EBMT contribute positively to the overall translation quality. The central thesis of this paper is that any researcher who continues to develop an MT system using either of these approaches will benefit further from integrating the advantages of the other model; dogged adherence to one approach will lead to inferior systems being developed.
引用
收藏
页码:301 / 323
页数:23
相关论文
共 50 条
  • [1] Towards Data-Driven Machine Translation for Lumasaaba
    Nabende, Peter
    [J]. DIGITAL SCIENCE, 2019, 850 : 3 - 11
  • [2] Data-driven models in machine learning for crime prediction
    Wawrzyniak, Zbigniew M.
    Jankowski, Stanislaw
    Szczechla, Eliza
    Szymanski, Zbigniew
    Pytlak, Radoslaw
    Michalak, Pawel
    Borowik, Grzegorz
    [J]. 2018 26TH INTERNATIONAL CONFERENCE ON SYSTEMS ENGINEERING (ICSENG 2018), 2018,
  • [3] Data-Driven Computational Neuroscience: Machine Learning and Statistical Models
    Kreinovich, Vladik
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (01) : 2513 - 2514
  • [4] A Novel Data-Driven Attack Method on Machine Learning Models
    Sadikoglu, Emre
    Kosesoy, Irfan
    Gok, Murat
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (03) : 402 - 417
  • [5] Machine Learning Methods for Development of Data-Driven Turbulence Models
    Yakovenko, Sergey N.
    Razizadeh, Omid
    [J]. HIGH-ENERGY PROCESSES IN CONDENSED MATTER (HEPCM 2020), 2020, 2288
  • [6] Heterogeneous data-driven hybrid machine learning for tool condition prognosis
    Wang, Peng
    Liu, Ziye
    Gao, Robert X.
    Guo, Yuebin
    [J]. CIRP ANNALS-MANUFACTURING TECHNOLOGY, 2019, 68 (01) : 455 - 458
  • [7] Classification of machine learning frameworks for data-driven thermal fluid models
    Chang, Chih-Wei
    Dinh, Nam T.
    [J]. INTERNATIONAL JOURNAL OF THERMAL SCIENCES, 2019, 135 : 559 - 579
  • [8] Damage Detection with Data-Driven Machine Learning Models on an Experimental Structure
    Alemu, Yohannes L.
    Lahmer, Tom
    Walther, Christian
    [J]. ENG, 2024, 5 (02): : 629 - 656
  • [9] Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type
    Qin, Yifan
    Wu, Jinlong
    Xiao, Wen
    Wang, Kun
    Huang, Anbing
    Liu, Bowen
    Yu, Jingxuan
    Li, Chuhao
    Yu, Fengyu
    Ren, Zhanbing
    [J]. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (22)
  • [10] Efficient Data-Driven Machine Learning Models for Water Quality Prediction
    Dritsas, Elias
    Trigka, Maria
    [J]. COMPUTATION, 2023, 11 (02)