Hybrid data-driven models of machine translation

被引:12
|
作者
Groves, Declan [1 ]
Way, Andy [1 ]
机构
[1] Dublin City Univ, Sch Comp, Dublin 9, Ireland
关键词
Hybrid; Example-based MT; Statistical MT; Statistical language models; Convergence; Chunk coverage; Europarl corpus;
D O I
10.1007/s10590-006-9015-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an extended, harmonised account of our previous work on combining subsentential alignments from phrase-based statistical machine translation (SMT) and example-based MT (EBMT) systems to create novel hybrid data-driven systems capable of outperforming the baseline SMT and EBMT systems from which they were derived. In previous work, we demonstrated that while an EBMT system is capable of outperforming a phrase-based SMT (PBSMT) system constructed from freely available resources, a hybrid 'example-based' SMT system incorporating marker chunks and SMT subsentential alignments is capable of outperforming both baseline translation models for French-English translation. In this paper, we show that similar gains are to be had from constructing a hybrid 'statistical' EBMT system. Unlike the previous research, here we use the Europarl training and test sets, which are fast becoming the standard data in the field. On these data sets, while all hybrid 'statistical' EBMT variants still fall short of the quality achieved by the baseline PBSMT system, we show that adding the marker chunks to create a hybrid `example-based' SMT system outperforms the two baseline systems from which it is derived. Furthermore, we provide further evidence in favour of hybrid systems by adding an SMT target-language model to the EBMT system, and demonstrate that this too has a positive effect on translation quality. We also show that many of the subsentential alignments derived from the Europarl corpus are created by either the PBSMT or the EBMT system, but not by both. In sum, therefore, despite the obvious convergence of the two paradigms, the crucial differences between SMT and EBMT contribute positively to the overall translation quality. The central thesis of this paper is that any researcher who continues to develop an MT system using either of these approaches will benefit further from integrating the advantages of the other model; dogged adherence to one approach will lead to inferior systems being developed.
引用
收藏
页码:301 / 323
页数:23
相关论文
共 50 条
  • [41] Data-Driven Fault Detection of Electrical Machine
    Xu, Zhao
    Hu, Jinwen
    Hu, Changhua
    Nadarajan, Sivakumar
    Goh, Chi-keong
    Gupta, Amit
    [J]. 2018 15TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2018, : 515 - 520
  • [42] SIMULATED PERFORMANCE OF A DATA-DRIVEN DATABASE MACHINE
    BIC, L
    HARTMANN, RL
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1986, 3 (01) : 1 - 22
  • [43] Robust data-driven machine-learning models for subsurface applications are we there yet?
    Mishra, Srikanta
    Schuetter, Jared
    Datta-Gupta, Akhil
    Bromhal, Grant
    [J]. JPT, Journal of Petroleum Technology, 2021, 73 (03): : 25 - 30
  • [44] Predicting torsional capacity of reinforced concrete members by data-driven machine learning models
    Chen, Shenggang
    Chen, Congcong
    Li, Shengyuan
    Guo, Junying
    Guo, Quanquan
    Li, Chaolai
    [J]. FRONTIERS OF STRUCTURAL AND CIVIL ENGINEERING, 2024, 18 (03) : 444 - 460
  • [45] Data-driven machine learning prediction models for the tensile capacity of anchors in thin concrete
    Momani, Yazan
    Alawadi, Roaa
    Majdalaweyh, Sereen
    Tarawneh, Ahmad
    Jweihan, Yazeed S.
    [J]. INNOVATIVE INFRASTRUCTURE SOLUTIONS, 2022, 7 (05)
  • [46] Machine Learning based Video Coding using Data-driven Techniques and Advanced Models
    Kwong, Sam
    [J]. PROCEEDINGS OF THE 2019 IEEE 18TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2019), 2019, : 4 - 4
  • [47] Data-driven machine learning prediction models for the tensile capacity of anchors in thin concrete
    Yazan Momani
    Roaa Alawadi
    Sereen Majdalaweyh
    Ahmad Tarawneh
    Yazeed S. Jweihan
    [J]. Innovative Infrastructure Solutions, 2022, 7
  • [48] A Data-Driven Comparative Analysis of Machine-Learning Models for Familial Hypercholesterolemia Detection
    Kocejko, Tomasz
    [J]. Applied Sciences (Switzerland), 2024, 14 (23):
  • [49] Automated Framework for Developing Predictive Machine Learning Models for Data-Driven Drug Discovery
    Neves, Bruno J.
    Moreira-Filho, Jose T.
    Silva, Arthur C.
    Borba, Joyce V. V. B.
    Mottin, Melina
    Alves, Vinicius M.
    Braga, Rodolpho C.
    Muratov, Eugene N.
    Andrade, Carolina H.
    [J]. JOURNAL OF THE BRAZILIAN CHEMICAL SOCIETY, 2021, 32 (01) : 110 - 122
  • [50] Data-Driven Prediction of Stability of Rock Tunnel Heading: An Application of Machine Learning Models
    Ngamkhanong, Chayut
    Keawsawasvong, Suraparb
    Jearsiripongkul, Thira
    Cabangon, Lowell Tan
    Payan, Meghdad
    Sangjinda, Kongtawan
    Banyong, Rungkhun
    Thongchom, Chanachai
    [J]. INFRASTRUCTURES, 2022, 7 (11)