A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation

被引:0
|
作者
Avramidis, Eleftherios
Costa-Jussa, Marta R. [1 ]
Federmann, Christian
Melero, Maite [1 ]
Pecina, Pavel
van Genabith, Josef
机构
[1] Barcelona Media, Barcelona, Spain
关键词
Machine Translation; System Combination; Annotated Corpus;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In recent years, machine translation (MT) research has focused on investigating how hybrid machine translation as well as system combination approaches can be designed so that the resulting hybrid translations show an improvement over the individual "component" translations. As a first step towards achieving this objective we have developed a parallel corpus with source text and the corresponding translation output from a number of machine translation engines, annotated with metadata information, capturing aspects of the translation process performed by the different MT systems. This corpus aims to serve as a basic resource for further research on whether hybrid machine translation algorithms and system combination techniques can benefit from additional (linguistically motivated, decoding, and runtime) information provided by the different systems involved. In this paper, we describe the annotated corpus we have created. We provide an overview on the component MT systems and the XLIFF-based annotation format we have developed. We also report on first experiments with the ML4HMT corpus data.
引用
收藏
页码:2189 / 2193
页数:5
相关论文
共 50 条
  • [1] Interference and the Translation of Phraseological Units in a Parallel and Multilingual Corpus
    Sanz-Villar, Zurine
    [J]. META, 2018, 63 (01) : 72 - 93
  • [2] MULTILINGUAL ANNOTATED ELECTRONIC PARALLEL CORPUS-BASED EDITION OF THE OLD GEORGIAN TRANSLATION OF THE BOOK OF TOBIT 1
    Dundua, Natia
    Kalkhitashvili, Tamar
    [J]. MUSEON, 2023, 136 (3-4):
  • [3] ISO-based Annotated Multilingual Parallel Corpus for Discourse Markers
    Silvano, Purificacao
    Damova, Mariana
    Oleskeviciene, Giedre Valunaite
    Liebeskind, Chaya
    Chiarcos, Christian
    Trajanov, Dimitar
    Truica, Ciprian-Octavian
    Apostol, Elena-Simona
    Baczkowska, Anna
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2739 - 2749
  • [4] An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output
    Pighin, Daniele
    Marquez, Lluis
    May, Jonathan
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1131 - 1136
  • [5] Construction of Mizo: English Parallel Corpus for Machine Translation
    Haulai, Thangkhanhau
    Hussain, Jamal
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (08)
  • [6] Machine Translation on a Parallel Code-Switched Corpus
    Menacer, M. A.
    Langlois, D.
    Jouvet, D.
    Fohr, D.
    Mella, O.
    Smaili, K.
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11489 : 426 - 432
  • [7] The Talk of Norway: a richly annotated corpus of the Norwegian parliament, 1998–2016
    Emanuele Lapponi
    Martin G. Søyland
    Erik Velldal
    Stephan Oepen
    [J]. Language Resources and Evaluation, 2018, 52 : 873 - 893
  • [8] THE ALIGNMENT OF A PARALLEL MULTILINGUAL CORPUS: PROPOSED PHASES FOR THE INVERSE SPECIALIZED TRANSLATION DIDACTICS
    Castillo Rodriguez, Cristina
    [J]. CADERNOS DE TRADUCAO, 2011, 27 (01): : 117 - 140
  • [9] XTest: A Parallel Multilingual Corpus with Test Cases for Code Translation and Its Evaluation
    Rithy, Israt Jahan
    Hossain Shakil, Hasib
    Mondal, Niloy
    Sultana, Fatema
    Shah, Faisal Muhammad
    [J]. Proceedings of 2022 25th International Conference on Computer and Information Technology, ICCIT 2022, 2022, : 623 - 628
  • [10] The Litkey Corpus: A richly annotated longitudinal corpus of German texts written by primary school children
    Laarmann-Quante, Ronja
    Ortmann, Katrin
    Ehlert, Anna
    Masloch, Simon
    Scholz, Doreen
    Belke, Eva
    Dipper, Stefanie
    [J]. BEHAVIOR RESEARCH METHODS, 2019, 51 (04) : 1889 - 1918